Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage IT

Happy World Backup Day 154

An anonymous reader writes "Easter isn't the only thing some people are celebrating today. Today is also World Backup Day. What steps have you taken to be able to resurrect your data, instead of having it go to eternal oblivion?"
This discussion has been archived. No new comments can be posted.

Happy World Backup Day

Comments Filter:
  • by vinn ( 4370 ) on Sunday March 31, 2013 @12:44PM (#43325049) Homepage Journal

    I managed to go 16 years in the IT world, first as a sys admin and now up through an awesome mid-level management position, without any serious data management scares. (And by 'awesome', I mean I work for demoralizing leadership and I've hit a glass ceiling which will force me to go find another company to work for if I want any shot at career advancement.) I've always made sure there's many, many layers of redundancy and good processes in place.

    That was until three weeks ago.

    We use Microsoft DFS to sync data between two sites. Because of some other things going on, we had to turn DFS off for 3 weeks. We thought we had everyone transitioned to using the "master" file repository, the one that gets backed up every night, etc, etc. The day we turned on DFS back on, all hell broke loose.

    Oh - and this is fairly important stuff: 10 years worth of CAD, design, and legal paperwork. It's a few terabytes worth. For our medium-size company, this is basically everything that we hold near and dear.

    The first thing that happened is DFS completely puked and completely trashed BOTH filesystems. Fantastic, Microsoft - what a wonderful piece of shit DFS is. Fairly quickly we had to face some data integrity issues. First, we discovered apparently there was a fella at the remote site that was using the copy of files there. Great.. through a fairly manual process we were able to retrieve most of his changes to the dataset. Next, we fairly quickly gave up on trying to fix the DFS - on the advice of Microsoft it seemed to be fairly hopeless.

    This is where shit gets real.

    Our head sys admin had been troubleshooting an issue with a drive in a RAID'ed NAS backup device had failed. All the other backups had been shifted to other NAS devices, but that backup was so large that it apparently had just been failing. While looking for that, we also discovered the quarterly backup from December had failed (that's the point where I wanted to put on my manager hat and go rip someone a new one, but decided that probably wouldn't be the most productive thing at the moment and could save that little teachable moment asskicking until after we were out of the woods.) Now, the sys admin hadn't been completely foolish, before turning DFS back on he had run some full backups using a different NAS device.

    In a f*cking brilliant stroke of disastrous luck, when we went to perform the recovery we discovered that RAID array on the backup NAS device also had corruption.

    Now, how bad the corruption was and what exactly that meant remained to be seen. The backups had completed without error, it was the NAS filesystem itself that was throwing the errors. The NAS was still running and our backup software seemed to recognize the backup catalogs on it. Ok, other than what seemed to one potentially corrupt backup, it was seeming like the next best case scenario was a quarterly backup from September, and I was also staring a complete set of disks from 2010 dreading the thought of bringing them back online. Well, with nothing to do other than try a restore, we pressed the button.

    That's when I went home mid-morning, chainsmoked four cigarettes on my porch and wondered what would happened if everything went south. In other words, I was contemplating my next job.

    'Lo and behold, and restore worked. We had to merge all kinds of things back together to get a complete copy reassembled, then we still had to get DFS working (which took four days of syncing over the WAN.) When it was all said and done, it looked like there were just two files from one set of changes that we couldn't recover.

    I think I'll go double check on the backup jobs now.

  • Re:Automated backup (Score:4, Interesting)

    by RabidReindeer ( 2625839 ) on Sunday March 31, 2013 @02:27PM (#43325723)

    Automated incremental backup of the headless servers at home, every two days (and I check the backup logs regularly). The backup disks are cycled every 4 weeks: the existing set goes to an insulated box in the garage (a separate heated building), while the previous disks come in and start with a full backup. Our 4 workstations at home all get backed up to local USB disks, but these are merely for convenience - important files are always kept on the servers, where they belong.

    You don't belong on this planet.

    Seriously, I run RAID, cross-machine mirroring, then do daily backups, with the logs emailed to me each morning. Periodic external media copies to DVD and USB devices. In my case, I have incentive, though. I used to work for a big-name backup software company and knew of design flaws that meant that a certain percentage of backups would write out defective data. And got burned in later years when I was compelled to use the product for my later employer. Because the RAID arrays would blow a disk the minute I'd leave on vacation, then blow a second one before I got back to replace it. And the restore would fail.

    For a long time I used TAR scripts, because unlike the fancy expensive commercial products, I could always count on being able to use a tarball as long as the media itself was undamaged.

    Ironically, this is the weekend I started learning Bacula. Tar is reliable, but it doesn't manage media catalogs.

The moon is made of green cheese. -- John Heywood

Working...