Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Security

ISP Recovers in 72 Hours After Leveling by Tornado 258

aldheorte writes "Amazing story of how an ISP in Jackson, TN, whose main facility was completely leveled by a tornado, recovered in 72 hours. The story is a great recounting of how they executed their disaster recovery plan, what they found they had left out of that plan, data recovery from destroyed hard drives, and perhaps the best argument ever for offsite backups. (Not affiliated with the ISP in question)"
This discussion has been archived. No new comments can be posted.

ISP Recovers in 72 Hours After Leveling by Tornado

Comments Filter:
  • Re:Well... (Score:3, Informative)

    by stratjakt ( 596332 ) on Thursday September 04, 2003 @02:29PM (#6871228) Journal
    Those businesses should realize they need a backup/disaster plan as well, if they absolutely could not withstand a day of downtime.

    Perhaps having the sites mirrored on two colos in two locations, and routing to the other one when the first goes offline.

  • by MachineShedFred ( 621896 ) on Thursday September 04, 2003 @02:54PM (#6871468) Journal
    Yup... definetly a manager concerned about the minutes, rather than the details.

    Details like it not being one box or even one rack that went down, but ALL RACKS, ALL WIRES, ALL ELECTRICITY, ALL WALLS, FLOORS, AND CELINGS.

    Also too busy to bother with details like punctuation or a proper paragraph from the look of it...
  • Re: so... (Score:3, Informative)

    by LostCluster ( 625375 ) on Thursday September 04, 2003 @03:02PM (#6871549)
    This ISP was also a dialtone provider...
  • by Fishstick ( 150821 ) on Thursday September 04, 2003 @03:03PM (#6871559) Journal
    that's computerworld receiving the /.ing

    the isp is here [aeneas.net]

    picture of the aftermath here [aeneas.net]
  • by sexylicious ( 679192 ) on Thursday September 04, 2003 @03:57PM (#6872199)
    Some places in tornado country can't have basements. This is due to the soil having extra clay, the water table being a couple feet below the surface, or annual flooding.
  • by ceije ( 662080 ) on Thursday September 04, 2003 @04:44PM (#6872840) Journal

    I think a lot of sites already have contingency plans for sudden traffic increases, and if not, they begin to think about them very seriously once they get a large spike in traffic that causes disruption of service. Even with traffic spike contingency plans, the level you establish as the maximum amount of traffic that you need to be able to sustain, and what amount of latency or down time is acceptable to business, can be and often is debated ad nauseum. It costs a lot of money to maintain readiness for, say, double or triple normal site traffic for a large site, and you have to make a business case for balancing that cost with the cost of an outage due to increased traffic.

    There are several things you can do to quickly add the capability to handle additional load, and most of them rely on forethought when establishing contracts with your colocation facilities and software/hardware vendors. For instance, most large colo facilities allow you to reserve additional bandwidth capability. You may pay more for that priviledge, but that's part of the cost of preparedness. Also, you may purchase or lease additional hardware, have it set up and ready to install in a short amount of time, but not use it on a regular basis because of high licensing costs.

    Licensing costs for database software can be enormous, but in the event of a large spike in traffic, turning on an additional 20 or 30 cpus on a large database server could save the company a lot of money in lost revenues. Especially if you database software vendor specifically allows this in your contract. If the contract doesn't allow this, you may end up paying a lot more in licensing fees than you would have made in revenue during the outage.

    My main point here is that planning for extra traffic is a big cost-benefit balancing act, and it requires a lot of forethought. Most large software, hardware and service providers allow for emergency clauses in contractual agreements, but it's often up to the customer to specifically call those out.

    But then again, it's like insurance. You hope you don't need it, but you're glad you have it when you do. And you have to pay for it even if you don't need it.

    Also, when you plan for traffic spike, you need to consider the source of the traffic. Denial of service attacks are often easy to mitigate with common network practices, and it's just a matter of preparing for those. But real, human-driven traffic is much different, less predictable, and actually capable of generating revenue.

    Understanding your company's site infrastructure, software architecture and day-to-day traffic patterns is very important when it comes to handling real traffic spikes. When a real spike happens, network operators, developers and database admins (among others), will probably need to jump into action, looking for and attempting to mitigate bottlenecks as they appear. This can be a difficult task, and there's nothing worse than knowing what the problem is and not being able to do anything effective to combat it in a reasonable amount of time.

    Real traffic doesn't just come from other sites, it can also be driven by other forms of communication, such as television, print and other media... even word of mouth (although I haven't seen an example of this). A large, syndicated national television news program that runs during primetime can generate a lot more traffic than most web sites, and those spikes seem to grow on orders of magnitude as the duration and repetition of air time increases. A fifteen minute segment that is marginally compelling might be enough to swamp all but the largest and most prepared sites. The silver lining of the television spike is that it declines very quickly after the segment ends.

    A spike from multiple media sources, for instance print, web, and television, could be very difficult to handle, both in magnitude and duration. Although, duration isn't often a problem, because even the most prepared sites will succumb under a huge spike and
  • by Artifex ( 18308 ) on Thursday September 04, 2003 @07:28PM (#6874513) Journal
    When I was an Aeneas dialup customer, in 1998, the service provided by Aeneas was also subpar. The dialup speeds were averaging 21.6kbps, where as when I switched to U.S. Internet(now owned by Earthlink) my dialup speeds were always above 26.4kbps(Except on Mother's Day). There were frequent disconnections, and they had a limit of 150hrs/month.


    Have you never learned what line quality means? Not just from you to your local POP, but beyond the local loop, on the trunks that go across town (or further) to the ISP's POP?

    21.6 is an interim speed that, when seen in conjunction with v.34, v.90, or the other modern standards that go beyond 28K, all technicians know means "well, it's trying, but the lines are crap." Connections that lousy are also prone to disconnects. Nobody deliberately locks their modems down to that speed to be jerks or to save bandwidth - if they were that cheap, they'd put you on old-style 28K modems, which are practically free. The U.S. Internet POP was probably in a different part of town from the Aeneas POP, so you went over different trunks to get there. (since Earthlink took over, they probably dumped their local lines also, and it's probably Sprint's dialup network that is serving you locally)

    Anyone getting weird speeds like that should be bitching to the local telco, not just to the local ISP, though they should have worked with you to isolate the problem to bad trunks. The fact that you're not getting 33 or better connection right now means the local part of the loop is still crappy.

    Oh, I do agree with your complaint about being limited by 150 hours a month. Still, that's not a lousy service issue, that was a contractual agreement you signed up for, right?

    As far as the school's T1 being slow goes, did you attempt any troubleshooting, or just blame it on the ISP? What did the router logs say? Were all channels 1-24 up? Were you getting frequent bounces? What were the CRC errors like? Did you arrange for a circuit test? What were the BERT results? More importantly... did you try throughput while directly connected to the router, since a lot of schools have really pathetic wiring systems because they're installed by volunteers who don't design and install networks for a living?

    - No, I never worked for Aeneas, but I've done everything from dialup to customer network engineering for a global Tier-1 provider, and I have learned from hard, hard experience to be cynical about complaints without supporting evidence.

Thus spake the master programmer: "After three days without programming, life becomes meaningless." -- Geoffrey James, "The Tao of Programming"

Working...