We recently talked about Compliance in the Cloud, and indicated that most of the privacy laws enacted in the US and elsewhere exempt you from reporting loss of protected data if that data was encrypted. OK, we’re done. We simply encrypt all of our data as it leaves our facility and moves around out there in the Cloud.
Which is much like saying that we can eliminate war if we can eliminate hunger. Possibly true. But that first step is neither obvious nor easy.
When your data is out there is the Cloud, it is in one of three states: in motion, at rest, or in process.
This is your data as it moves from your facility through the Internet to your Cloud Service Provider (CSP), and moves around inside your CSP between servers or between servers and storage, or even out to other service providers. It includes data moving through Local Area Networks (LANs), Wide Area Networks (WANs), storage networks, and, of course, the Internet. The Cloud extends the distance of these networks and probably the number of such networks.
- You don’t know the path your data is taking.
- You don’t know who is looking at the data. Lots of things are inspecting your data including many network devices looking inside your data to improve network performance or protect against malicious attacks, or just scanning the networks looking for key words or data that looks like a Social Security number (for example).
- There are well known ways to actually intercept a network message, and change it before it is passed on. Often this is done to trick the receiving system into returning different data than originally requested, or it can be done just to disrupt your business by corrupting databases.
- Your data is often sharing the same network infrastructure with other customers of your CSP. Usually, there is a reasonable attempt by the CSP to prevent that comingling from happening, but in the world of networks, it is very easy to misconfigure a network component and allow data to be visible where it should not be visible, and very hard to detect that it happened.
- You don’t know who is managing these networks and therefore has the ability to scan all messages. It may be your CSP, or they may have outsourced network operations to a third party located somewhere else.
This is your data as it is stored. It can be stored in a controlled environment like a SAN (Storage Area Network) or NAS (Network Attached Storage) or the hard disk drives in servers. It can also be stored in less controlled places like the disk drives in workstations, laptops, tablets, PDAs, and smart phones. There are also the totally unmanageable places like CDs, DVDs, thumb drives and smart cards. As with Data-in Motion, the Cloud extends the places your data is “resting” and who has access, without your knowledge.
- Your CSP has your data in log files, audit files, dump files, archives and backups.
- Your CSP has your data in its disaster recovery site(s).
- You are very likely to be sharing the same storage infrastructure shared with other customers of your CSP, and perhaps even the same physical disk drive.
- Your data and other customers’ data is likely in the same security and network log files, and any dump files taken to investigate a problem.
- Again, you don’t know who is managing these storage systems and therefore has the ability to access your data. It may be your CSP, or they may have outsourced storage operations to a third party located somewhere else. Likely it is a different company than is managing the network.
Data-in-process is your data while it is actually being processed inside a server or workstation. The data could be in memory, in cache, or in registers inside the CPU. Normally we don’t worry very much about that. This is data that is changing quickly, usually coming and going at microsecond time scales, and data that disappears when power goes away. Certainly inside your own data center, this state of your data is not a worry. Inside the Cloud, it does become more complex. Through virtualization, multiple customers are likely using the same physical memory, cache and CPU over the same short time interval. Bugs in the virtualization layer or the operating system can allow that information to be inappropriately available to other processes running in the same server. Memory dumps and other diagnostic tools may collect data from multiple customers, which is then seen by those working on fixing the problem.
We distinguish among these three data states for two reasons:
- These states have different characterizations that impact the requirements on encryption.
- They require different solutions, as there is no product that handles all three states.
In fact there is no way currently, nor likely in the near future, to deal with Data-in-Process. Any encryption algorithm would substantially increase the time it takes to do anything, and wouldn’t solve the problem anyway. When you want to add 1 and 4 the CPU has to see the numbers as their actual numeric value – you can’t do arithmetic on encrypted data. Have you ever tried to add using Roman numerals? That is a very simplistic form of encryption, but it makes the problem much more complicated. The only way to deal with Data-in-Process is through your selection of a CSP. Make sure they have the process and procedures to treat any such data, such as dumps, with appropriate care. The CSP should let you read their process documents, and should share with you at least summary results from any periodic audit.
The most obvious attribute that distinguishes between Data-in-Motion and Data-at-Rest is lifetime, sometimes referred to as “retention.” This is the length of time between when the data is first encrypted and when it is last decrypted.
Data-in-Motion has a lifetime measured in milliseconds. Even getting data from halfway around the world and back to you is no more than half a second. Once the data gets to the other end, you never need that encrypted message again. If it later is sent somewhere else, it just gets encrypted again. If one of the endpoints loses the encryption key, basic network protocols will automatically re-initialize the connection, probably with a different encryption key, and resend the message. Nothing is loss. Nothing is vulnerable since what is left on the network is encrypted. Typically, Data-in-Motion encryption keys are maintained only for the duration of a single session (e.g., the length of time you are doing online banking). This time is usually measured in minutes, and some encryption products will automatically create a different key every so many minutes. These keys are usually stored only in the server or workstation memory at the end points. When someone signs off or turns off the workstation, the keys disappear.
Data-at-Rest may have a lifetime measured in decades. This means that you have to keep the keys available, but secure, for long periods of time. The bad news is that if you lose access to the keys, you lose the data. The good news is that you can effectively destroy the data by simply destroying the key. In the early days of World War II, as Winston Churchill was approaching Paris on his last visit just before the Germans took the city, he remarked that it was sad to see the center of Paris burning. What he actually saw was the many plumes of black smoke from all the embassies and French government offices burning their papers. If the data had been stored electronically and encrypted, it would have taken only a few seconds to destroy the keys and the data would have been rendered useless, with no environmental impact.
Another attribute of encryption is its strength. You will often see strength expressed as the length of its key. For example, standard WIFI security originally used a 64-bit WEP (Wired Equivalence Protocol) key, of which you provided 40-bits in order to sign on to your wireless router in your home or office. 40-bits is five characters, so your key could be something like “1a5GX.” Most manufacturers now support a 128-bit WEP key, of which you must provide 104 bits (the equivalent of 13 characters). These keys are usually specified in hexadecimal characters, the “numbers” 0123456789ABCDEF representing the values 0 through 15, so each “letter” takes two hexadecimal characters. The important thing is that a 128-bit key is a lot stronger than a 64-bit key, not just twice as strong but more than 300,000,000,000,000,000,000,000,000 (or 3×1026) times stronger. Many Data-at-Rest and some Data-in-Motion encryption products have 256-bit or longer keys (which are 3×1038 times stronger than a 128-bit key. Another way to measure strength is how long will it take to break the encryption without having the keys. For something like SSL the time is measured in minutes or hours. For encryption algorithms like AES-256 (Advanced Encryption Standard), an encryption standard adopted by the US and other governments, the length of time to break it is measured in thousands or millions of years.
Clearly you want the strongest encryption you can find, right? Maybe, but probably not. The stronger the encryption the longer it takes to encrypt and decrypt the data. Unless you have hardware assistance on the encryption and decryption, it can have a significant impact on the time it takes the data to move through a network, or to retrieve it from storage. It is usually more of an impact on Data-at-Rest where you may be accessing thousands or millions of bytes of data at a time. By the very nature of networks, large blocks of data are broken down into relatively small pieces and transmitted separately.
The other issue that encryption techniques help is in ensuring the integrity of the data – making sure that the record received at the other end is exactly what you sent, or the data you retrieved from the database has not been inappropriately changed since it was written ten years ago. Many encryption products will also add additional information to the encrypted data that is verified upon decryption to ensure that the data was not changed. These hashing algorithms also have keys, and again the strength of the hash depends on the length of the key, as does the time it takes to hash the data originally and then again during the decryption. The hashing information has to be stored or transmitted with the encrypted record, and that extra data is approximately the size of the hashing key. For Data-in-Motion, this can sometimes almost double the size of the message to be transmitted. In most cases, there is enough available band pass that it has no real affect on the network, but is something that must be considered. For Data-at-Rest, because the data is stored and read in blocks that are very large, the overhead is usually only a few percent of the total size.
The bottom line is that there are existing solutions to the encryption issue for both Data-in-Motion and Data-at-Rest, but few of them are actually designed to work in the Cloud, and some may actually interfere with the CSP’s operation. As one example, many CSPs promise to reduce the size of your storage (and thus its cost) by de-duplication: looking for the same data in multiple files and only storing it once. Think of an email that has been passed back and forth a dozen times with each person adding a paragraph or two. You end up with the most of that resulting email stored a dozen times in each person’s email files. By de-duplication you might reduce the total storage by an order of magnitude. However, de-duplication doesn’t work on encrypted data because the same “phrase” will look different after being encrypted based on exactly where it is in the data. Compression algorithms also do not work well on encrypted data, and may in fact actually increase the size of the data.
The last word:
We’ll explore further this encryption issue in the Cloud in future postings, looking at both issues and opportunities of encrypting data in the Cloud for Data-in-Motion and Data-at-Rest.
Keep your sense of humor.