ISP Recovers in 72 Hours After Leveling by Tornado 258
aldheorte writes "Amazing story of how an ISP in Jackson, TN, whose main facility was completely leveled by a tornado, recovered in 72 hours. The story is a great recounting of how they executed their disaster recovery plan, what they found they had left out of that plan, data recovery from destroyed hard drives, and perhaps the best argument ever for offsite backups. (Not affiliated with the ISP in question)"
Users need their porn! (Score:2, Insightful)
Times of crisis and how one deals with them are the mark of successful businesses/employees/people. I don't think that we could recover so quickly should a disaster of that size hit my job, but it'd be fun to try.
Nice work! (Score:5, Insightful)
Fire... (Score:5, Insightful)
Re:Nice work! (Score:3, Insightful)
Re:Fire... (Score:3, Insightful)
so... (Score:3, Insightful)
Cool, but could be better (Score:5, Insightful)
Then I've seen the other end of the spectrum - a 6 Billion dollar corporation's world HQ IT center... wow. They have disaster recovery sessions and planning like I never would have imagined. Very cool facility, but it has to be like that. Some day if they get burned, it's all over.
Truly stunning (Score:5, Insightful)
What amazes me isn't that these people were able to restore service to their customers in 72 hours. They used standard systems administration techniques. BGP was specifically mentioned.
No, what amazes me is that this is news. The IT industry is so full of idiots and morons and MCSEs that taking basic precautions earns you a six-figure salary and news coverage. These folks didn't even have off-site backups, it was luck that they were able to resume business operations (ie: billing) so soon.
Moral of the story? When automobile manufacturers start getting press coverage for doing a great job because unlike their competition, they install brakes in their vehicles, you know that the top-tier IT managers and executives have switched industries.
72 hours thats pretty bad (Score:2, Insightful)
Re:Fire... (Score:5, Insightful)
Er, for how much data? For your personal computer, maybe (but the tape drive will cost you considerably more than that $100), but I don't think you're going to back up a few hundred gigs of business data on ~$100 of tapes. And I suspect you meant 100... although if the latter then you're almost certainly correct!
It's not very hard (drive tapes to site). It's not difficult to get the backups if you need them (drive to site with tapes)
If your offsite backup is within convienent driving distance then odds are it's not far enough offsite. A flood, tornado, hurricane, earthquake, or other large scale natural disaster could conceivably destroy both your onsite and offsite backups if they're within a few miles. The flipside is that the further the distance the more the inconvienence on an ongoing basis and the more likely you are to stop doing backups.
There's far more to be considered here, but I'm not the DR expert (my wife is... seriously). It does make sense to have offsite backups, but you have to have some sense about those too.
Re:72 hours thats pretty bad (Score:3, Insightful)
Re:Amazing is an innapropriate adjective (Score:2, Insightful)
Wow! This is exactly the reason that systems administrators generally dislike most members of their development group. Your attitude does not do very much to endeer us 'cable monkeys' and 'PHB's to you.
"IT people", who give a shit about logs and backups and think plugging a PC and monitor into a powerbar is "computer science"
If you think this is all that is involved in running a remotely large and reliable network, you are sadly mistaken my friend. A lot of thought, planning and testing goes into most corporate network infrastructures.....kinda like software development.
"Computer Science" is a very broad term that encompasses much more than just 'programming'.
make sure off-site is far enough away (Score:4, Insightful)
Re:Compare and contrast... (Score:3, Insightful)
Re:Amazing is an innapropriate adjective (Score:3, Insightful)
What takes an hour is that the technician has to take care of the other 20 people who can't be bothered to plug a cable back into the wall on their own.
Oh, and, of course, the tech also has to take care of real work - like fixing the programmer's machine after he installs the latest Webshots and Gator software.
Me: "It took our technican an hour to get all of the malware off of Stratjakt's computer that he downloaded from the Internet."
CTO: "Didn't he read the email that I sent out every month for the last six months telling the employees not to install non-work-related software?"
Me: "Well, I asked him about that...he said that he was a programmer and just doesn't care."
CTO: "He's fired."
Oh, and, incidentally, when your self-administering software becomes proficient enough to keep your big foot from wrapping around the network cable and yanking it out of the wall, then I'd say you really had something worthwhile. At this point, though, I have my doubts.
Re: so... (Score:2, Insightful)
Re:Amazing is an innapropriate adjective (Score:3, Insightful)
Keep up the good work.
sloth jr
Re:Screw remote backup.... (Score:1, Insightful)
I'm in California, and as such, we design buildings to take a certain scale of earthquake or less; not because clients are cheap, but because above a certain point all bets are off, no matter what kind of building you've built! At some point the force of Nature you're dealing with is so staggering that no amount of preparation or work can give you a guaranteed resistance.
I doubt many buildings could take a direct hit from a tornado; and even if they could that's not saying that everything that's not the building (i.e. all that fancy computer equipment and nice people inside) wouldn't be sucked out and sent to OZ in a minute...
What about practicing your disaster recovery? (Score:3, Insightful)
In the article the writer was talking about how much work it was to migrate the T1 connections, and how they hadn't forseen that. That is exactly the sort of thing that a practice disaster recovery uncovers.
If you want the model from the place I work it is simple enough:
1. Run the disaster recovery during a 24 hour period
2. Pat yourself on the back for what worked.
3. Ignore what doesn't work.
4. Repeat next year.
Of course next year gets a new step:
3.5 Act surprised that stuff didn't work.
Re:Poor tech support (Score:4, Insightful)
It amazing how accurate you are in reguards to customer viewpoint on downtime.
After having done it myself, I actually have MUCH more respect for technicul support engineers/supervisors becuase within reason most "downtime" is fixed even before the customer knows about it (i.e. small blips in service).
And the majority of people who purchase an ISP's services have absolutely no idea what it takes to respond to an outtage.
Not good enough (Score:3, Insightful)
When you go to a DRP seminar, they make the claim that the majority of business that are knocked out for longer than 48 hours go out of business within 1 year.
Re:Amazing is an innapropriate adjective (Score:3, Insightful)
This is really sad, and the company could have fired him for being incompetent. He basically destroyed their intellectual property through negligence, wasting all the money they invested in his project, which was almost certainly more than just his salary for that time period.
If a truck driver gets a load and forgets to check his own tie-downs, and as a result loses the load before reaching his destination, whose fault is it?
Besides, as supreme programmer, he should be motivated to work sometimes from home in the middle of the night, and have backups there