One of the most astounding IT stories of the week comes courtesy of a computer technician from the Alaska Department of Revenue who wiped out a disc drive containing an account worth US$38 billion, the Associated Press reported.
The unnamed computer technician, while reportedly doing routine maintenance work, accidentally deleted applicant information from Alaska’s Permanent Fund Dividend, which is the fund that pays Alaska residents a yearly dividend of Alaskan oil-related profits.
To make matters worse, the technician also reformatted the organization’s backup drive, completely erasing the backup data, which consisted of 800,000 electronic images of fund applications and supporting documentation.
The Alaska Permanent Fund Dividend Division also had tape backups, but it turns out the tapes were unreadable. The department turned to its last option: rescanning and entering data stored in 300 cardboard boxes. Lots of overtime, extra help and weekend work later, the department burned through $200,000 but had recreated the data in time to send out last year’s dividend payments of $1,106.96 to around 600,000 qualifying Alaskan residents.
In this case, a single operator nearly wiped out months of work and effort for thousands of Alaskan residents. The lucky fact the Dividend Division held onto the paper documents saved the day, which actually happened last July — it seems as though this story only came to light because of the department’s request for a budget boost to cover the extra data recovery costs.
How often does this kind of problem crop up, anyway? After all, it’s not the sort of thing companies or governmental organizations are happy to shout out to the world.
“Accidental deletion is pretty common. If you look at it overall, the most common causes of unplanned downtime tend to be human error,” Stephanie Balaouras, a senior analyst for Forrester, told TechNewsWorld. “It’s rarely massive hardware or software failures or the big disaster events we plan for; it’s usually something more mundane.”
Backup and recovery is a complex ecosystem, Balaouras added. There’s the backup software itself, as well as the library, the drive and the network, which makes it hard to pinpoint issues in the backup and recovery process. However, there are some common points of failure.
Fairly Common Failures
“Data failures related to hard drives are fairly common,” Charles King, principal analyst for Pund-IT, told TechNewsWorld. “The bigger a company’s storage environment is, statistically, you’re going to lose a number of drives a year just out of wear and tear. But most businesses will have backup drives available to cover these failures.”
What happened in the Alaska situation, King noted, was a combination of human error and mechanical failure — in this case, the problem with the tapes being unreadable, which, as Balaouras stated, isn’t as uncommon as most people think.
“Tapes are a magnetic medium, and they are very susceptible to environmental conditions, like changes in temperature or humidity,” she explained. “If you drop them or handle them improperly in any way, you can drastically affect your ability to actually read data from them. They are fairly fragile, and there’s a difference in quality as well, depending on who you buy your tapes from.”
The No. 1 thing organizations can do to prevent backup and recovery problems is to test their backup and recovery procedure — which is also the one area many organizations loathe to delve into. It requires the ability to juggle production data with backup data, test the process and continue working without interrupting day-to-day operations.
The second most important prevention method is to have a good change management system, Balaouras says, “because your environment is constantly changing.” What appears to be simple application upgrades, for example, can have ripple effects that are difficult to pinpoint in a backup and recovery plan.
There are many backup and recovery applications and solutions that customers can buy, of course, but in the end, it all comes down to things more difficult to control.
“Typically, in backup and recovery software, there are safeguards built into the procedures to prevent the sort of thing that happened in Alaska,” King explained. “But when the command comes up, ‘Are you sure you want to delete that file?’ … and the person just blows through it — it’s still pretty easy to destroy all that data.”