Senior information technology managers are always concerned about the confidentiality, integrity and availability of their data.
- Confidentiality means that only those people who are supposed to see their data can see it. In the military environment, that has two attributes: right to know and need to know, and you must have both to be allowed to see data. Most companies should follow this same protocol.
- Integrity means that only authorized processes are allowed to modify data and only in very specific ways. For example, it means that the transaction I send to the Cloud arrives unchanged at the service provider, and the response comes back to me unmodified. It means data stored in my archive hasn’t been changed while it is just sitting there for years.
- Availability means that the data is accessible when needed. If in order to satisfy my customers I need to respond to them in less than a second, I need to make sure I can always get any required response from the Cloud in time to meet that need. Perhaps more importantly, there must be adequate protection to ensure that the data never disappears because of a building failure or accidental or deliberate cyber attack.
You can use a wide variety of tools to protect these data attributes. They usually fall into three categories: encryption, replication and compression.
Encryption, along with adequate control of the encryption keys, is the best way to provide confidentiality. It means that anyone without the key will have a very hard time reading the data. Many commercially available encryption algorithms take many years to break without the keys.
- Provides Confidentiality.
- Encrypting data in motion and at rest is often an important requirement of compliance. Almost always it is not considered as a violation of compliance or legal requirements if the data that is lost or stolen is encrypted.
- Some encryption products also provide integrity capabilities so that if the data is accidentally or deliberately modified, that modification is at a minimum detected and in some cases automatically corrected.
- It takes significant processing power to encrypt and decrypt the data.
- Encrypted data is always bigger than the clear text data, usually by 10% to 20% but may be as high as 50% or more, especially if the encryption product also provides integrity checks.
Replication is the way companies protect their data from loss. There are many forms of replication, from simple backups stored in the same room as the server or workstation, to corporate wide multi-site disaster recovery solutions. Many of these are moving to Cloud-based solutions.
- Provides Availability.
- Can be set up to be invisible to your system users.
- It significantly increases the total amount of storage required by the organization. Between local duplication through technologies like RAID to protect against a single hardware failure to multiple sites including the Cloud, most companies keep the equivalent of seven copies of everything just to provide availability.
- It can give management a false sense of security. In my experience, senior management has a much rosier expectation of the availability of their data through a failure or disaster than is justified by reality. This can make it difficult for IT to get the resources necessary for an effective disaster recovery solution.
The cost of storage has been cut in half about every fourteen months since 1980. In 1980, the cost of one Gigabyte of storage was almost a half million dollars. Now it is five cents – a seven orders of magnitude reduction. Even so, as companies accumulate more and more data, they are looking for any means of slowing down their storage growth requirements.
Compression is the general term for products that reduce the size of a file or set of files. In general, they do that by looking for repeated patterns, then replacing each occurrence of the pattern by a link to the pattern. Clearly, the effectiveness of a compression product depends on how sophisticated the product is in finding patterns, and whether the data itself contains patterns. Email archives are usually very compressible. Must consider how many times your own email signature would appear in 1,000 of your emails. Databases are often not very compressible since a well-designed database tends to not keep repeating the same data by establishing relationships.
Standard compression algorithms tend to find smaller chunks of data, often just looking within a single file. Deduplication is a special form of compression that works by identifying large chunks of identical data, often entire files. Consider how often a particular file may be an attachment to an email that has been passed around to dozens of people. A deduplication product would only store the file one time. Without deduplication, and even with compression, those dozens or hundreds of copies of the same file are subject to replication.
- Can provide significant storage space savings to help mitigate the impact of encryption and replication solutions.
- Like encryption, compression can have a significant impact on processing power requirements.
- You should not compress encrypted data. Encryption algorithms, by their very nature, tend to remove replicated patterns: the same phrase in a Word document, for example, will get encrypted differently each time it appears depending on its exact location in the document. Compressing an encrypted file often increases the size of the file.
The last word:
The key to making this all work effectively is to really understand your requirements in the three areas of confidentiality, integrity and availability, and then find a set of encryption, replication and compression solutions that can meet those requirements. All data is not created equal, and the requirements for confidentiality, integrity, and availability may vary substantially across different classes of your data. The only way to have a single solution is to treat all of your data as if it has the most stringent requirements of any of your data in each area. This can get very expensive.
Keep your sense of humor.