I hope you all read Wade Bower’s comment on my previous blog. One of his points is the title of this blog: just because you have a backup process in place does not mean you have a disaster recovery plan. While backup is a key and necessary component of a recovery plan, it is far from sufficient for almost every business.
How do you figure this out for your business? Start by determining which applications are critical to running your business. These are often the applications that you need to make sales or make product. Then determine the business needs, your RTO and RPO for your critical applications.
Quick Review: Your recovery time objective (RTO) is the time period after a disaster in which business functions need to be restored. This is the length of time after a failure occurs that you need to be back in operation.
Your recovery point objective (RPO) is the age of files that must be recovered from a backup or other mechanism. The RPO is expressed backward in time (that is, into the past) from the instant at which the failure occurs. This is a measure of how old the data can be when you come back up.
The actual recovery time has several components depending on the exact reason you need to recover. It may include the time to get back to a working infrastructure. After that it depends on the time needed to actually recover your data.
For a backup-only recovery plan, the data portion of the recovery is determined by how fast you can reload your data. Even if the data is local implying zero time to locate and connect the data, you are looking at ten seconds to reload 1GB (gigabyte), and therefore between 2.5 and 3 hours to reload 1TB (terabyte = 1,000 GB).
McKinsey & Co. report that nearly all sectors in the US economy have an average of at least 200 TB of stored data per company employing more than 1,000. It will take more than three weeks to reload that much data through a single recovery channel. That time can be significantly reduced if you your data and servers can support multiple parallel channels.
Of course if the data is off-site, the time can increase significantly. As Mr. Bower pointed out, over even a fast network you are looking at a minimum of 10 days per TB. Multiple simultaneous paths usually don’t help much: somewhere there is probably a network bottleneck that will limit the gains you can get.
As a good friend once said, “Never underestimate the band pass of a 747.” If your data is remote, consider telling your backup service to FedEx you the data overnight on hard disk drives. You may have to pay the local load time twice (once at the backup service side and once at your side), but in the long run it will be faster even for a single TB.
This reload time is your minimum recovery time. How does that compare with your business-driven RTO?
The other aspect of recovery is where in time are you after you reload the data from the backup? You have lost everything you did since that backup. If you have daily backups at the end of the day, then everything you did the day of the event that forced the recovery must be redone. Somehow. The frequency of your backup is the best RPO you can meet.
How long will it take to recovery the transactions that occurred since the last backup? Do you have a documented procedure to accomplish that? Are you able to recover those transactions? Where is the data stored that enables you to recover those transactions? Can that data be lost or damaged?
The last word:
I got a phone call from a customer at 7AM one morning. They had had a fire overnight that completely destroyed their building. A couple of concrete walls and a pile of smoldering rubble less than 2 feet tall was all that remained. It was a relatively small organization – about a dozen people, networked computers, 99% of their customer contact was over the phone, and one critical database that controlled everything they did. I called a local hotel and got them a couple of adjacent conference rooms, called the computer vendor and arranged for replacement computers and network gear to be delivered to the hotel that morning, and the phone company to relocate their lines to the hotel. By early afternoon we had everything ready to go, and I asked them for their last backup. I had documented a procedure where they made a backup every night as they closed up for the day, and one clerk was supposed to take the backup home. They had separate media for each day of the week, all of which they had left sitting on top of the server.
Fortunately for them, I had created a backup when I went in one weekend a couple of months earlier to make some significant system updates, and still had it. By 3PM they were up and operational, but, unfortunately, with a two month old database.
For a very serious disaster, their actual recovery time was eight hours, but their actual recovery point was two months. They spent the next three months trying to recover all of the data they lost over those two months.
Don’t let this be your story.
Keep your sense of humor.