So You've Lost a $38 Billion File 511
smooth wombat writes "Imagine you're reformatting a hard drive so you can do a clean install but then realize that you have also reformatted the back up hard drive. No problem. You reach for your back up tapes only to find out that the information on the tapes is unreadable. Now imagine the information that is lost was worth $38 billion. This scenario is apparently what happened in July to the Alaska Department of Revenue. From the article: 'Nine months worth of information concerning the yearly payout from the Alaska Permanent Fund was gone: some 800,000 electronic images that had been painstakingly scanned into the system months earlier, the 2006 paper applications that people had either mailed in or filed over the counter, and supporting documentation such as birth certificates and proof of residence.' Using the 300 cardboard boxes containing all the information, staff worked overtime for several months to rescan everything at an additional cost of $200,000."
And this is why... (Score:4, Interesting)
The whole thing is a joke... (Score:2, Interesting)
What, another hyperbole-filled, wildly inaccurate Slashdot post? Inconceivable.
Data recovery? (Score:5, Interesting)
The same goes for tapes. There is no mention in the article of why they were "unreadable" what level of damage there was to the data, etc.
We all make mistakes, but 3 layers of backup data storage all failing suggests a horrifically poor system in-place. Not JUST "very bad," that's hard to believe, without some massive natural disaster causing it.
Re:Redo the work? (Score:2, Interesting)
Re:Tapes? (Score:1, Interesting)
P.S: Iron Mountain are a waste of space. We found another, much smaller, company to handle our tapes after Iron Mountain fucked up one too many times.
Re:Tapes? (Score:5, Interesting)
Re:Tapes? (Score:2, Interesting)
Get the person with the purse strings to go through the 'cost of downtime' calculation.
Lead them throught it, point out all the lovely parts like contractual obligations (engineering companies tend to need to keep designs for long long periods of time) or 'regulations' (Sarbanes-Oxley has a lot to answer for).
Add in the cost of x many people not working for a week.
Include the 'well, can our business still function if we lose our customer database'.
And if that really doesn't work, then clearly your last resort is artificially induced panic, where you raise the possibility of 'something important' being gone, and unrecoverable. Payroll records are a good example, as that's a personal terrror as well as a 'problem for the company'.
John Cleese: backuptrauma (Score:1, Interesting)
http://www.backuptrauma.com/ [backuptrauma.com]
Though, using rsync to backup to rotating partitions works as well.
On odd days, rsync to B1
On even days, rsync to B2
On odd weeks, rsync to B3
On even weeks, rsync to B4
On odd months, rsync to B5
On even months, rsync to B6
On every two odd months, rsync to R7
On every two even months, rsync to B8
On every odd year, rsync to B9
On every even year, rsync to B10
So with 10x the space, you can have easy instant access to:
a day or two ago.
a week or two ago.
a month or two ago
4 or 8 months ago.
a year or two ago.
Re:Redo the work? (Score:4, Interesting)
A law firm handed me a computer that wouldn't boot. On it was some pictures taken concerning a wrongful death case. It turns out that the pictures were a hindsight and in the middle of fixing it, The task turned from getting the computer to run to getting the pictures from the drive. The drive was failing and was larger then the 137gig 28bit LBA limits. But we didn't know this because it was never booted and XP pre SP1 did not enable 48bit addressing by default. And even after SP1, if you didn't update your ATAPI driver to x.1135 or later, it wouldn't be enabled by default even if you have the ability. So connecting it to another computer made it worse. Eventually the fault in the drive which was a crashed head, made it impossible for us to recover past the boot sector running traditional recovery software. The data recovery specialist were able to get around everything we added to the problem as well as the problem itself and retrieved better then 98% of everything on the drive. I think one file was bad but we weren't concerned with it at all.
Long story short, the pictures showed someone's negligence in a wrongful death case and once they were presented or added to the evidence pile, the defendant's insurance company settled for 2 mill. The lawsuit was for more then that so you could probably guess what it could have been worth. The firms cut was in the area of 40% from what I understand. So it was worth 40% of 2 mill to them. $2500 seems like a little amount in comparison.
Re:Time for... (Score:5, Interesting)
Re:Tapes? (Score:1, Interesting)
$38 Billion is a big incentive for fraud (Score:5, Interesting)
Re:Tapes? (Score:2, Interesting)
Continuity planning can be complicated based on the environment, and quite often overlooked until the first time it is needed.
Few companies maintain sufficient hardware of their own for true disaster recovery purposes. In most cases like that, the organization will have redundant data centers that are probably used in a load sharing model. Hopefully one data center can carry the full load of the critical activities if the other goes down. For these organizations, backup tapes are really intended for a complete disaster at all sites that would require acquiring new hardware.
Other organizations have agreements with firms like Sunguard and IBM for cold sites. These vendors guarantee a certain square footage in their own data centers. They then work with the client to understand the exact hardware and software requirements that would be needed in case the cold site needs to go hot. In these instances, the tapes are shipped to the Sunguard or IBM site and loaded on machines as quickly as possible. The contracts normally give the vendor a minimum amount of time to stand up hardware and load the software and data, governed a great deal by how much data needs loading.
Just a note, if your company is deciding on going the hosted DR route, make sure before hand that you have agreement from your software vendors that your license allows you to load their software outside your organization. I worked at one company that didn't have that in their original software contracts and had to spend more money with the software vendors when they created a DR plan. Many software vendors won't mention this little detail.
Most often I've seen backup tapes used when for example an important database table was dropped accidentally. The last good backup tape was loaded and the database completely restored to get back to production. This is what you'd think of as single system disaster recovery.
Re:Tapes? (Score:3, Interesting)
Re:Data Recovery options? (Score:5, Interesting)
Google was my friend. Shortly I learned more about backup superblocks, how to run "mkfs.ext3 -n" to do a dummy mkfs and find out where my backup superblocks are, and "fsck.ext3 -b nnn" to repair the filesystem using the backup superblock.
I was back running in less than an hour, including google time. Repairing an accidental mkswap on top of ext3 is actually one of the easier things to fix.
On the other hand, having a system and procedures that made it possible to kill regular and backup data that way, and storing unconfirmed tapes, is clearly not a good idea. Whenever I burn a CD/DVD, I take the few extra minutes and verify it right away. If the backup tape was only a few months old, odds are it was improperly written, as opposed to degraded. They should check their other backup tapes.
Re:Backups are the devil (Score:3, Interesting)
The hardware; is extremely expensive. And the software ain't cheap (if you expect any degree of automation or features).
This extortion racket is precisely why most people don't do backups, and of the few that do do backups, they do not test them. (but you've spent the money - never really understood that).
I have memories of ten years of sob stories; guys who were calling in to tech support because they were about to lose their jobs because they were poor stewards of their employers' data. Sometimes it was our fault (software bugs, poor documentation) - sometimes it was the hardware vendors' fault (bad firmware, defective lots of tape, etc.), sometimes it was the OS vendors' fault (interoperability standards between file systems, network protocols, etc.) - sometimes, it was just bad luck. But more often than not, it was ignorance and laziness, and above all - CHEAPNESS. Some MIS hack didn't want to spring for a quality backup drive, or didn't want to take the time to test-restore data, or didn't want to hire a college intern to inspect error logs regularly for backup problems.
It just KILLS me to see folks suffer because they weren't careful with their data.
But at home? Screw it. I don't backup. I'm cheap.
You've got to be able to separate valuable data from stuff you can re-install or re-download.
Re:$38 Billion is a big incentive for fraud (Score:4, Interesting)
Anyway, what DIDN'T shock me about this story is that after formatting the main disk, the tech immediately (and blissfully) formatted the backup as well. I've seen stuff like that happen like ten times. ("Oh, well, after I replaced the drive, I figured I should replace the backup tapes too, so we could have a fresh start, so I threw them out." or "I figured I should make a backup right away, so I over-wrote the good backup with the new, bad, data.") I don't want to blame the victim, but sometimes it's like the data wants to be destroyed at that point. My favorite was when someone added a second drive to an important source control server to do nightly drive to drive back-ups. Then, they stopped doing tape backups nightly and switched to weekly. Then, they forgot they disconnected a fan during the HDD installation (or it was accidently disconnected -- it remains a debated point), then the server fried itself and the drives. Then everyone lost a day of work rebuilding the source archive based on their local data. Good times.
Re:Data recovery? (Score:0, Interesting)
Re:Test restores? You jest. (Score:3, Interesting)
As an IT auditor, I do ding IT shops when they don't do full system restores (which has the dual benefits of verifying that the techs are capable, and verifying that the media is readable). I'm going to be printing out this story and showing it to people who don't do full system restores... I get along fine with BOFHs, and I can sympathize with them about the burden of SOX, but while I'm doing the audit, I don't let them slide on this.