One of the real benefits of the Cloud is that you don’t have to decide where things happen. Your Cloud Service Provider (CSP) can place its data centers wherever it can get inexpensive land and power, and staff it with experts at, perhaps, lower pay rates than you can. By taking advantage of scale, your CSP can provide top notch 24 / 7 service to match or exceed what you can in your own facility, probably give you some interesting and inexpensive disaster recovery options, and definitely give you much faster response time to increased demand for resources making your business more agile. Most likely, all of this for less than you are currently paying, often significantly less. It’s all good.
However, as you move an application or workload into the Cloud, Oh where, oh where, has your data all gone? And why should you care?
If you have your own data center or a managed service provider, you probably know exactly where all your data is. There it is, inside a known data center, behind locked doors, with a reliable staff. You probably store your hourly or daily backups in the same place in case of a simple hardware or power failure, and keep some set of backup data off-site in case of a complete building failure. You may also have a remote disaster recovery facility or service. You probably use somebody like Iron Mountain to store archive data as required by law or corporate policy. The bottom line is you know where it all is, and you could actually go get all of it.
In general, this is not true in the Cloud. You may not even know which of your CSP’s data centers has your data. You may not know in what country, or even on what continent. It is quite possible that your disaster recovery site is one place, your backup copies in another, and your main processing somewhere else.
Your data transitions among three “places.”
- Data at Rest: stored data, whether for a short time or a long time. The obvious examples are disk storage containing your databases, other files like documents, and solutions like your email system. It also includes backup copies, archive data, and wandering storage units like thumb drives, portable disk drives (which now include cell phones), CDs and DVDs. Don’t forget all of your workstations and laptops, each with its own multi-gigabyte storage capability.
- Data in Motion: data moving through some network. This could be a small controlled network like that within your data center, or an uncontrolled network like the Internet.
- Data in Process: data being processed within some computer resource like a workstation or server.
In your own datacenter you control all of this. You can see the storage units and the servers, and have properly isolated the internal network for computer-to-computer and computer-to-storage communications. You have policy to cover who can do what with their laptops and all those wandering data storage capabilities.
In the Cloud, you have a lot less control. But why should you care? Depending on the data, maybe because of …
1. Compliance issues.
3. Your own data life-cycle management policies.
4. Discovery orders.
5. Your desire to prevent the loss of confidential data and intellectual property.
If you deal with credit and debit cards, that part of your business is covered by PCI-DSS (Payment Card Industry Data Security Standard). Mess this up and you can lose your ability to accept or process payment cards. This is a very strict compliance standard, for good cause. It is a prime attack target for cyber-criminals, and they are very good at it. I strongly suggest that you don’t try to take this data or processing into the Cloud. About the best you can do today is a hosted private Cloud implementation, and the current success rate for even that is low. Probably in 2-3 years, you can look at it again.
If you deal with personal health information, that data and its processing is covered by HIPAA (Health Insurance Portability and Accountability Act). While not as complicated as PCI-DSS, HIPAA compliance is still difficult. You can implement a compliant environment with Infrastructure as a Service, but it requires the right controls from an experienced CSP. Ask them for some references from existing HIPAA compliant customers. This is a case where I don’t think you want to be their first guinea pig.
If you have other compliance requirements, make sure you understand those requirements and get your CSP to agree to comply in your contract. In any case, you should get a qualified third party to do a compliance audit before you go live in the Cloud.
Laws come in two broad categories: location and access. Laws are different for different countries, and are changing.
In many countries, personal private data must stay within the country or region. The European Directive on the Protection of Personal Data covers the EU, and Canada’s PIPEDA (Personal Information Protection and Electronic Documents Act) brings Canada inline with the EU directive. Of course, the UK, US and Japan (just to list a few) have different regulations. The differences include what data is protected, life-cycle management requirements, access to and correction of an individual’s own data, and where it can be stored or transmitted.
Some CSPs will not even tell you what country your data is in. Some will tell you at a high-level (e.g., “in Europe”). Some will actually tell you where your data is, if you are a Private Cloud customer. Only a few will reveal location data to any customer.
Some countries, including the US and China, have laws that allow government surveillance of all data. It is probably unwise to store data that is covered by the EU Protection of Personal Data in the US and China, and in general unwise to store any data that you must or would like to keep private in China, even (or especially) if encrypted. To complicate it further, the US Patriot Act may cover foreign operations of a US-based country, making it virtually impossible to comply with contradictory regulations.
Data Life-Cycle Management
You have (or should have) strict policies on the retention of data. The policies are driven by laws, especially tax laws, compliance requirements, partner and customer contracts and requirements, and the general needs of your business. This is a very complex problem. Most organizations err by keeping data too long, spending more money than they need to keep the data and increasing the cost of complying with court-ordered discovery. The Cloud adds the complication that you don’t even know what backup data your CSP may be keeping.
Data Life-Cycle Management is such a complex issue that the Cloud rarely makes it significantly harder. If you have a good life-cycle policy and are following it with appropriate systems and procedures, then it is important that your CSP can ensure that your data in the Cloud complies with those systems and procedures. If you don’t, then I wouldn’t worry about the impact of moving data to the Cloud.
When you receive a court order asking for all of the documentation on a particular subject, can you deliver all and only the appropriate documents? Most organizations don’t do a good job of it. The possibility, or in some industries, the high probability of receipt of a discovery order is one of the drivers to implementing a data life-cycle management system. Most organizations give far more than they should, and fail to give everything they must because they don’t know where all of the data is. Like data life-cycle management, if you have existing policies, systems and procedures in place, it is well worth the effort to make sure that your CSP can interface with them. If you don’t, then moving to the Cloud won’t make it much worse than it already is.
The same issues that surround compliance and life-cycle management apply to your own confidential information (e.g., customer and partner contracts, roadmap and strategy documents, marketing plans, merger and acquisition contemplations, financial results, and your intellectual property). In addition, you probably have non-disclosure agreements with partners and customers that require that you also protect their confidential information. Before you move a workload into the Cloud, make sure you understand what really is in the data or can be deduced from the data.
The last word:
The location issues will get better. Laws and regulations are changing, often because of the Cloud. The Cloud is also evolving, and I expect to see the existence of compliance-specific Cloud solutions from some of the more enterprise-class CSPs in 2011 and 2012.
In the mean time, it is most important that you know whether you care where your data is. In some cases you don’t. In other cases, you really do have to be able to point to your data and say “there it is.” Usually, your requirements are somewhere between those extremes.
The first step is to make sure you understand your location requirements, then determine that your CSP can meet them. Write down those requirements and get them in your contract.
In coming blogs we’ll talk about other concerns with security, and how to mitigate them.
Keep your sense of humor.