ISP Recovers in 72 Hours After Leveling by Tornado 258
aldheorte writes "Amazing story of how an ISP in Jackson, TN, whose main facility was completely leveled by a tornado, recovered in 72 hours. The story is a great recounting of how they executed their disaster recovery plan, what they found they had left out of that plan, data recovery from destroyed hard drives, and perhaps the best argument ever for offsite backups. (Not affiliated with the ISP in question)"
Heh (Score:4, Funny)
Re:Heh (Score:2)
But... (Score:5, Funny)
The slashdot effect differs from a tornado in a few subtle ways:
1) You can't see it coming (unless you pay money to be a subscriber)
2) It doesn't hurt anything, except for webservers, the occasional OC line lit up like New Year's Eve, spammers, and the odd *IAA executive.
3) A tornado doesn't typically smell like armpits, cheetos, empty 64oz soda cups, burning plastic, your parent's basement and/or too much cologne for that first date.
4) It travels at the speed of light, a lot quicker than a tornado.
5) Does not require specific atmospheric conditions to be present...just a link on the front page.
Anything else?
Re:But... (Score:5, Interesting)
Re:But... (Score:5, Funny)
Re:But... (Score:3, Funny)
Re:Heh (Score:2)
I mean, maybe it'll come up next week on Trailer Park Boys.
It will be a damn shame... (Score:3, Funny)
That's fricking awesome (Score:5, Funny)
"99.18% for our service, and 96.2% for our building."
Poor tech support (Score:5, Funny)
"Are you guys down again? You're down more than you're up! I'm going to find another service... etc..."
"Ma'am our facilities have been entirely leveled by a tornado, we'll be back up in 72 hours."
"72 HOURS?! I have photos of my grandchildren I have to mail! Worst ISP ever! Let me speak to your supervisor!"
"Ma'am our supervisor was also leveled by the tornado."
*click*
Not that I work tech support for an ISP and am bitter...
Re:Poor tech support (Score:4, Insightful)
It amazing how accurate you are in reguards to customer viewpoint on downtime.
After having done it myself, I actually have MUCH more respect for technicul support engineers/supervisors becuase within reason most "downtime" is fixed even before the customer knows about it (i.e. small blips in service).
And the majority of people who purchase an ISP's services have absolutely no idea what it takes to respond to an outtage.
Users need their porn! (Score:2, Insightful)
Times of crisis and how one deals with them are the mark of successful businesses/employees/people. I don't think that we could recover so quickly should a disaster of that size hit my job, but it'd be fun to try.
Nice work! (Score:5, Insightful)
Re:Nice work! (Score:3, Insightful)
Re:Nice work! (Score:2)
Re:Nice work! (Score:3, Interesting)
Re:Nice work indeed! (Score:2)
Wow (Score:2)
Elephant Insurance (Score:5, Funny)
Another thing to add to the plan... (Score:2)
"Move somewhere where the wind don't blow quite that much" =)
However, it amazing how soon after a 'total disaster' a system can be up and running again. I distinctly recalls seeing a lot about just that in the paper (the one made from dead wood) after 9/11. Kudos, I say!.
one comment and it's gone . . mirror (Score:2, Redundant)
SEPTEMBER 03, 2003 ( CIO ) - The evening of Sunday, May 4, 2003, at Aeneas Internet and Telephone began as any previous Sunday evening had. The Jackson, Tenn.-based company that serves about 10,000 Internet and 2,500 telephone customers was closed for the weekend, awaiting the return of its 17 employees the next morning. Just before midnight, however, all hell broke loose. An F-4 category twister touched down just outside of town, then tore through Jackson's downtown are
Re:one comment and it's gone . . mirror (Score:2)
But there have been too many trolls lately that repost the article with certain, er, modifications. Like the one about a rocket launch that slipped in several references to other... "rocket-shaped personal entertainment devices".
Besides, there's not much point in Karma Whoring anymore. Who *doesn't* have Excellent karma these days?
Fire... (Score:5, Insightful)
Re:Fire... (Score:2)
Re:Fire... (Score:3, Insightful)
Re:Fire... (Score:5, Insightful)
Er, for how much data? For your personal computer, maybe (but the tape drive will cost you considerably more than that $100), but I don't think you're going to back up a few hundred gigs of business data on ~$100 of tapes. And I suspect you meant 100... although if the latter then you're almost certainly correct!
It's not very hard (drive tapes to site). It's not difficult to get the backups if you need them (drive to site with tapes)
If your offsite backup is within convienent driving distance then odds are it's not far enough offsite. A flood, tornado, hurricane, earthquake, or other large scale natural disaster could conceivably destroy both your onsite and offsite backups if they're within a few miles. The flipside is that the further the distance the more the inconvienence on an ongoing basis and the more likely you are to stop doing backups.
There's far more to be considered here, but I'm not the DR expert (my wife is... seriously). It does make sense to have offsite backups, but you have to have some sense about those too.
Re:Fire... (Score:2)
Re:Fire... (Score:2)
Re:Fire... (Score:3, Interesting)
If the vault is destroyed, then you're probably right. But it doesn't take that to render the data unusable -- if the bank gets hit, the vault may survive but the keys may be destroyed (yeah, I'm sure they can get more made or have a locksmith come in, but that will take time). Or the vault is inaccessible for some amount of time due to damage. Even if the data is good, having it unavailable does you
Re:Fire... (Score:2)
make sure off-site is far enough away (Score:4, Insightful)
Re:make sure off-site is far enough away (Score:4, Interesting)
And then gets slashdotted (Score:2, Funny)
Re:And then gets slashdotted (Score:4, Interesting)
Re:And then gets slashdotted (Score:3, Informative)
I think a lot of sites already have contingency plans for sudden traffic increases, and if not, they begin to think about them very seriously once they get a large spike in traffic that causes disruption of service. Even with traffic spike contingency plans, the level you establish as the maximum amount of traffic that you need to be able to sustain, and what amount of latency or down time is acceptable to business, can be and often is debated ad nauseum. It costs a lot of money to maintain readiness for
Re:And then gets slashdotted (Score:5, Informative)
the isp is here [aeneas.net]
picture of the aftermath here [aeneas.net]
Because of a tornado... (Score:4, Funny)
Well that's what you casemodders get for installing twenty overpowered cooling fans in every one of your 1000 servers!
Tornad'oh! (Score:5, Funny)
"Bring me the router of the wicked switch of the Qwest!"
Although, I am starting to wonder. Has anyone checked to see if this ISP has a record of resisting RIAA subpeonas? Perhaps the RIAA levelled it after acquiring cloudbuster [geocities.com] equipment.
Re:Tornad'oh! (Score:2)
Re:Tornad'oh! (Score:2)
Re:Tornad'oh! (Score:2)
Compare and contrast... (Score:5, Interesting)
Re:Compare and contrast... (Score:3, Insightful)
Re:Compare and contrast... (Score:2)
Err, burned or they got what they paid for?
If your friends really cared about thier data, they would still have it. Period.
Who are thier customers going to blame? Not the ISP. ISPs are a commodity item that can be hosted just about anywhere, and I'm sure that some of them provide backups/offsite backups as part
72 Hours to recover from tornado obliteration . . (Score:2, Funny)
Before someone else says it... (Score:5, Funny)
Re:Before someone else says it... (Score:5, Funny)
No, it runs
-3 Stupid.
Re:However... (Score:5, Funny)
Don't forget the grits! (Score:2)
Unfortunately (?), I wasn't an active Slashdotter when Natalie Portman and Grits were associated in the minds of the troll community [wikipedia.org], so I can't come up with anything myself. Maybe that's a Good Thing.
How about (Score:3, Interesting)
2) ??? (aka mad-scramble to initiate plan)
3) Profit (or at least don't go under)
This must have been a pretty in depth recovery plan though. I mean, even with backups and a redundant connection elsewhere... I think that for myself processing the fact that my office had just been bowled over by wind-on-steroids would faze me for a little while (office...tornado...holy...shit...must...recover. .
Now they're up and running, but what of their old office? It mu
Screw remote backup.... (Score:2, Funny)
so... (Score:3, Insightful)
Re: so... (Score:3, Informative)
Re: so... (Score:2, Insightful)
Cool, but could be better (Score:5, Insightful)
Then I've seen the other end of the spectrum - a 6 Billion dollar corporation's world HQ IT center... wow. They have disaster recovery sessions and planning like I never would have imagined. Very cool facility, but it has to be like that. Some day if they get burned, it's all over.
Truly stunning (Score:5, Insightful)
What amazes me isn't that these people were able to restore service to their customers in 72 hours. They used standard systems administration techniques. BGP was specifically mentioned.
No, what amazes me is that this is news. The IT industry is so full of idiots and morons and MCSEs that taking basic precautions earns you a six-figure salary and news coverage. These folks didn't even have off-site backups, it was luck that they were able to resume business operations (ie: billing) so soon.
Moral of the story? When automobile manufacturers start getting press coverage for doing a great job because unlike their competition, they install brakes in their vehicles, you know that the top-tier IT managers and executives have switched industries.
Re:Truly stunning (Score:5, Interesting)
I agree, although maybe not so vehemently. For the IT managers who need a clue, the article is evidence that a sound disaster recovery plan works. Obviously, in the case of the ISP, the plan wasn't completely sound, but the other, possibly more important, point of the article is that the ISP's management recognized that their recovery plan was incomplete. Based on the lessons they learned, they made changes.
I work for a large (~20,000 employees) company, with about 10,000 employees at one site. The IT department (actually the entire company as well) has a disaster recovery plan in place. But beyond having a plan, we also have drills. As an example, we are in the flight path of the local airport (possibly not the best place in the world for a manufacturing site). What happens if a plane crashes smack in the middle of the plant? Hopefully we'll never know for sure, but the drills that we've run showed strong and weak points of the disaster plan. The strong points were emphasized, the weak points were revised and the disaster plan continues as a work in progress.
Specifics aside, and maybe this is just stating the obvious, but considering a disaster recovery plan to be a continuously evolving procedure could be one of its strongest points.
-h-
Planning and Funding (Score:2)
72 hours thats pretty bad (Score:2, Insightful)
Re:72 hours thats pretty bad (Score:3, Informative)
Details like it not being one box or even one rack that went down, but ALL RACKS, ALL WIRES, ALL ELECTRICITY, ALL WALLS, FLOORS, AND CELINGS.
Also too busy to bother with details like punctuation or a proper paragraph from the look of it...
Re:72 hours thats pretty bad (Score:3, Insightful)
5 minutes is bad if your servers still exist. (Score:2)
Re:72 hours thats pretty bad (Score:3, Funny)
New BOFH Excuse... (Score:3, Funny)
Off topic, I know..... (Score:2, Interesting)
Before we dole out all the praise... (Score:2)
They had to recover the drives from the rubble and after numerous failed attempts, finally found a data extraction company that could retrieve the data.
While their recovery, and foresight is impressive, I don't think we should raise them up as the example, when they ommited something as simple as carrying a backup home every once and a while. They got lucky, with regar
Re:Before we dole out all the praise... (Score:2)
This isn't a debate on this backup method or that one -- just the fact that you NEED ONE. I personally gave up on tape and went to live hard drives for pure ease and speed, while at the same time cutting costs drastically. All servers, RAID-5, dump their data/configurations to a local RAID-1 IDE based system (encrypted of course).
Daily it's running 35-40G currently. Dump that data to a portable dri
I live in Jackson.... (Score:5, Interesting)
Unless Aeneas has made some major changes they are quite certainly the worst ISP I have ever worked with. Aeneas has contracts with the Jackson-Madison County School System to provide internet service district wide. The quality of such service is, bar none, the worst I have experienced.
I did some volunteer work at a local Elementary school helping teachers work out any lingering computing problems they had(Virii, printer drivers, misconfigured ip settings, file transfer to a new computer, etc). The internet service I experienced while I was there lead me to believe I was on a 128k ISDN line. Not until I went to the server room did I realize that I was, infact, on a T1. Now this is during the middle of summer, mabye four other persons were in the building, three of which were in the same room as myself. The service was also intermittent, having several dead periods while I was working. Needless to say, I remained unimpressed by said experience.
When I was an Aeneas dialup customer, in 1998, the service provided by Aeneas was also subpar. The dialup speeds were averaging 21.6kbps, where as when I switched to U.S. Internet(now owned by Earthlink) my dialup speeds were always above 26.4kbps(Except on Mother's Day). There were frequent disconnections, and they had a limit of 150hrs/month.
I'm not supprised how easy it is to restore subpar service. All they had to do was tie together the strings that are their backbone.
Re:I live in Jackson.... (Score:3, Informative)
Have you never learned what line quality means? Not just from you to your local POP, but beyond the local loop, on the trunks that go across town (or further) to
An ISP in tornado country (Score:2)
Point is, if I had more than just a few thousand dollars worth of equipment, especially if I had a million's worth, I'd want to keep it safe. This is earthquake country (California) so he
I think I'd be pretty worn out... (Score:2)
What about practicing your disaster recovery? (Score:3, Insightful)
In the article the writer was talking about how much work it was to migrate the T1 connections, and how they hadn't forseen that. That is exactly the sort of thing that a practice disaster recovery uncovers.
If you want the model from the place I work it is simple enough:
1. Run the disaster recovery during a 24 hour period
2. Pat yourself on the back for what worked.
3. Ignore what doesn't work.
4. Repeat next year.
Of course next year gets a new step:
3.5 Act surprised that stuff didn't work.
72 Hours is a little long.... (Score:3, Interesting)
The company (ISP/consulting/services hosting) I used to work for had a DR plan to be executed in 24 hours with 75% functionality. Offsite servers and backups of course...
More impressive to me is the World Trade Center folks like American Express and other companies that had DR plans situated across the river. A lot of datacenters and information services were functional again within 18-24 hours. That's PPP PPP (prior planning prevents piss-poor performance).
I write good sigs on my bathroom wall...but this is not a real sig.
Re:72 Hours is a little long.... (Score:3, Interesting)
Prior to 2000, they built an entirely new system and ran it in parallel with the current one, for six months. Every transaction went through bo
offsite backups (Score:2)
okay, it's my house, but it counts.
if my house burns down, it's unlikely the office will suffer the same fate, and vice versa - it's a 20 m
tape backups? (Score:3, Interesting)
How about talking about disaster recovery for a REAL company with tens to hundreds of terabytes of data sitting on disk? The kind of data that you cannot lose and must have back on-line asap?
This article is like congratulating them for putting up detour signs when a road is destroyed, or rerouting power when a power line goes down.
Just about everything that was destroyed was not-unique, manufactured items that could be recreated and repurchased. The only exception was the user data, which was pulled off of a nearly destroyed drive by a data recovery company. (Lucky for them!)
I would like to hear more about companies that lose tons of difficult to replace, unique items, such as TBs of user data, prototype designs, business records, etc.
I would bet that if a company were to permenantly lose these types of things, they would nearly go out of business.
Been there, done that, Northridge Quake (Score:5, Interesting)
That was on a Monday. The next Monday was the Northridge quake.
They came into the next meeting a couple of weeks after the quake with a whole new perspective on disaster planning and training:
Re:Been there, done that, Northridge Quake (Score:4, Interesting)
But one thing with disaster recovery is you need to figure out what is and is not a disaster you should worry about. I live in Jerusalem, terorism is something very real here but mostly hits soft targets. On the other hand major blizards are a non issue. In Boston we worried about Nor'easters and occationaly a huracane. If you live in Utica NY you probalby don't have to worry to much about terrorism. Fire can happen anywhere.
I don't know how you figure out what is or is not a probable event in your location. I suppose you talk to the insurance folks they have spent a lot of time figuring this out.
The other question is how much recovory can you afford? If your disaster recovory plans puts your company into chapter 11 it was not a very good plan.
I like saying "Utica"
Comment removed (Score:3)
Hmmm... (Score:3, Funny)
Then we could've kicked a dog while it was down.
Not good enough (Score:3, Insightful)
When you go to a DRP seminar, they make the claim that the majority of business that are knocked out for longer than 48 hours go out of business within 1 year.
From the article... (Score:3, Funny)
This was from a mazazine for managers, after all. Now there's some good news that pointy-haired bosses can understand!
Nice name for data recovery company... (Score:3, Funny)
Wait, did anyone else even read the article?
Oh, never mind.
Re:Well... (Score:3, Informative)
Perhaps having the sites mirrored on two colos in two locations, and routing to the other one when the first goes offline.
Re:Welcome ! (Score:3, Funny)
+1 Informative?!?
Does that mean that some moderator actually believes that we have, indeed, been conquered by twisters?
Re:Amazing is an innapropriate adjective (Score:5, Funny)
But, as a programmer, I just dont care.
When I was a sophomore, working on my electrical engineering degree, I worked for a small, network-centric company that employed what seemed to be an abnormal number of snooty programmers and technical writers. Maybe it wasn't so abnormal.
Me: "Hi, IT support."
Stratjakt: "Hey, I know you're just a high-school educated 'IT person', but you need to get one of your cable monkeys up here and find out why I can't see the network!"
Me:: "OK, but let's check a couple of things quickly before I dispatch a technician. It may save some time."
Stratjakt: "Hey, I'm a programmer! I just don't care!"
Me: "I understand...I realize that my mundane existance doesn't have the exhilaration and exitedness of the thrilling, edge-of-your-seat world of a computer programmer, but there are just a few simple things that we could do to resolve this problem that will be faster than you waiting for a technician."
Stratjakt: "I just don't care."
Me: "No problem, I'll dispatch a technican."
An hour later...
Technician: "Stratjakt is all fixed up. I plugged his network cable back into the jack."
Re:Amazing is an innapropriate adjective (Score:2)
Also, a programmer that cannot diagnose problems at multiple levels is a bad programmer. This is why I think tools like GUI IDEs can cause more harm than good, because they trick programmers into thinking everything is dandy and cool. However, when those tools fail, I've seen programmers waste days on what should be trivial to fix (it turns out that nifty tool is quite inflexible, indeed).
Re:Amazing is an innapropriate adjective (Score:3, Funny)
Re:Amazing is an innapropriate adjective (Score:3, Insightful)
What takes an hour is that the technician has to take care of the
Re:Amazing is an innapropriate adjective (Score:2)
My point exactly. No serious programmer would allow that. Of course, no serious programmer would install that malware on their system to begin with, would they?
-h-
Re:Amazing is an innapropriate adjective (Score:3, Interesting)
Yep, thats the way it works. I dont crawl around on the floor plugging shit in and getting dirty.
...
They're just added beurocracy for the computer world, and I work to replace them each and every day with more sophisticated self-administrating softwares.
If you don't know how to crawl around on the floor plugging shit in and getting dirty, you do not have the perspective necessary to write software to replace the people who do. The best programmers are not arrogantly disconnected from the people in
Re:Amazing is an innapropriate adjective (Score:4, Interesting)
Let me start with this line:
"I realize that slashdot is mostly populated by high-school educated "IT people", who give a shit about logs and backups"
You claim to be a programmer, I have been a programmer and am now a Sys Admin, as both the BEST way to troubleshoot was from the logs. Unless you are the supreme programmer whose code never needs debugging and whose users never mispunch something causing an error a log file will let you see and know what has happened.
Now for this line:
"and restoring backup tapes is exhillirating and exciting."
I have restored from tape backup. We had a "programmer" BS from Virginia Tec, Masters from UMass who was certain he knew exactly what he was doing when he blew away an entire production database. (Actually he was a really good guy who just made a simple mistake) Fortunately we had tapes to restore from. But if ANYONE thinks that a restore is "exhillirating" (yes I left your type/mistake in there) then they are just strange. That was one of the most tedious and boring things I have had to do. But we had been tedious in backing EVERYTHING up so production was not severely impacted.
Now for where you directly insult everyone:
"I fully expect the PHBs and army of cable monkeys to get the network up and running in our new location."
So as a systems admin do I become a cable monkey? or am I a PHB? Either way I would be VERY needed if a disaster strikes just as I am needed every day. As for the elitist attitude and your lack of knowledge and concern for the backend of systems I am glad you do not work anywhere near me as I hate IT personal that have to call me to run windows update on their system when the latest worm comes around or to show them how to NOT clik ignore when Norton tells them they have a virus.
In short, Please show some respect for your coworkers and realize that these guys were prepared and did what their plan stated they could do.
If not don't be alarmed if somehow your account gets disabled and everything blown away and surprisingly they won't have backups, cause you "just don't care" for them.
Re:Amazing is an innapropriate adjective (Score:2, Insightful)
Wow! This is exactly the reason that systems administrators generally dislike most members of their development group. Your attitude does not do very much to endeer us 'cable monkeys' and 'PHB's to you.
"IT people", who give a shit about logs and backups and think plugging a PC and monitor into a powerbar is "computer science"
If you think this is all that is involved in running a remotely large and reliable network, you are sadly mistaken my friend. A lot of thought, planning and testing goes into mo
Re:Amazing is an innapropriate adjective (Score:3, Insightful)
Keep up the good work.
sloth jr
Re:Amazing is an innapropriate adjective (Score:2, Funny)
Re:Amazing is an innapropriate adjective (Score:3, Insightful)
This is really sad, and the company could have fired him for being incompetent. He basically destroyed their intellectual property through negligence, wasting all the money they invested in his project, which was almost cert
Re:Amazing is an innapropriate adjective (Score:2)
Its a funny joke.
But anyway, I would becareful with your generalizations. I, as a programmer, have an enormous amount of respect for the admins because well, they know their shit and are pretty cool to boot... and you can't just say a programmer can be replaced by folks in India or Russia... as someone who has as much experience as I do, and not just hac
Of course it is wireless! (Score:2)
Come to think of it, the Bronze Age could be called "wireless" as well. Makes them sure look advanced?
Re:My ISP's disaster recovery plan (Score:3, Interesting)
Re:My ISP's disaster recovery plan (Score:4, Interesting)