


Software Glitch Caused 911 Outage For 11 Million People 115
HughPickens.com writes: Brian Fung reports at the Washington Post that earlier this year emergency services went dark for over six hours for more than 11 million people across seven states. "The outage may have gone unnoticed by some, but for the more than 6,000 people trying to reach help, April 9 may well have been the scariest time of their lives." In a 40-page report (PDF), the FCC found that an entirely preventable software error was responsible for causing 911 service to drop. "It could have been prevented. But it was not," the FCC's report reads. "The causes of this outage highlight vulnerabilities of networks as they transition from the long-familiar methods of reaching 911 to [Internet Protocol]-supported technologies."
On April 9, the software responsible for assigning the identifying code to each incoming 911 call maxed out at a pre-set limit; the counter literally stopped counting at 40 million calls. As a result, the routing system stopped accepting new calls, leading to a bottleneck and a series of cascading failures elsewhere in the 911 infrastructure. Adm. David Simpson, the FCC's chief of public safety and homeland security, says having a single backup does not provide the kind of reliability that is ideal for 911. "Miami is kind of prone to hurricanes. Had a hurricane come at the same time [as the multi-state outage], we would not have had that failover, perhaps. So I think there needs to be more [distribution of 911 capabilities]."
On April 9, the software responsible for assigning the identifying code to each incoming 911 call maxed out at a pre-set limit; the counter literally stopped counting at 40 million calls. As a result, the routing system stopped accepting new calls, leading to a bottleneck and a series of cascading failures elsewhere in the 911 infrastructure. Adm. David Simpson, the FCC's chief of public safety and homeland security, says having a single backup does not provide the kind of reliability that is ideal for 911. "Miami is kind of prone to hurricanes. Had a hurricane come at the same time [as the multi-state outage], we would not have had that failover, perhaps. So I think there needs to be more [distribution of 911 capabilities]."
backup for 911 (Score:5, Informative)
have your local police and fire phone numbers in your cell phone and posted next to your land line.
Re: (Score:2, Funny)
What are you going to do? Shoot the cardiac arrest?
Re: (Score:1)
No, you shoot AGAINST the wind, not with it.
Re: (Score:1)
This is a recording of Archangel calling 911:
Operator: What can I do for you? ... he ... he has a cardiac arrest. I think he is dead. ... ... ... PAN ... ...
Archangel: My friend
Operator: Calm down! Go back to your friend and check that he is dead.
Archangel: Ok
Archangel: I confirm! He is dead now.
Re: (Score:2)
Sounds like a variant of a famous joke.
Operator: 911. What is your emergency?
Hunter: My hunting partner just had a heart attack. I think he's dead.
Operator: Go make sure.
[sound of a gunshot]
Hunter: Okay. Now what?
Re: (Score:2)
Sounds like a variant of a famous joke.
Operator: 911. What is your emergency? Hunter: My hunting partner just had a heart attack. I think he's dead. Operator: Go make sure. [sound of a gunshot] Hunter: Okay. Now what?
Yep, this is the right way to phrase it, makes more sense this way.
Re: (Score:2)
Re: (Score:2)
wow, do tell how your gun delivers babies, provides oxygen to elderly, puts out fires. For that matter, it won't even stop a crime at your vacationing neighbor's house without likely sending you to prison.
I love firearms but brainless people with guns are a big problem.
Re: (Score:1)
Re: (Score:2)
This is my rifle (raise weapon),
this is my GUN (grab crotch),
this is for fighting,
THIS is for fun
Re: (Score:2)
And the brain size seems inversely proportional to the number of guns. At least it seems that way viewing from a safe distance (Australia).
Re: (Score:2)
Wrong, in the USA there is very high per capita ownership of guns in nice areas with no crime. It takes a lot of money to own and maintain firearms.
Fear the inner city punk with just one. Most the gun crime (and rape, armed robbery, etc.) is done in inner cities by a couple of subcultures with no respect for life and property.
Re: (Score:1)
And the brain size seems inversely proportional to the number of guns. At least it seems that way viewing from a safe distance (Australia).
That ratio is usually more closely related to an organ located a bit closer to the middle of the body. (The same organ whose size is inversely proportional to the size of one's SUV or to the speed and price of one's sports car.)
Re: (Score:2)
For the idiots that don't comprehend because their rose tinted liberal glasses don't work, here is the relevant portion highlighted:
Have a gun. In a real emergency, the police are too busy to help everyone.
You see, you're deliberately changing the parameters to make fun of me. I get it, you're too stupid to have a valid argument against what I actually said, in context, so you change the context. The reality is, it makes you look stupid.
But okay, lets go with the hypothetical "cardiac arrest" mentioned below. Okay, you're cardiac arrest is because your shop is being overrun by a
Re: (Score:2)
Which is kind of ok as long as your problem happens in that area, and there is no actual natural or man-made disaster in play.
A 911 dispatch center has much better resources than a local police or fire department main desk. If you are lucky, they can operate at 10% of the capacity as a proper dispatch center.
The system should be more robust. It has improved dramatically since 9/11 with absurd amounts of cash poured into many facilities, but it can't do everything. An alternative solution is to not expect
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
They also list their dispatch numbers. That's what you call when you want to report a non-emergency. For example, I called it to report a completely wasted drunk lady causing a disturbance. Not quite enough to call 911 but I wanted to report her before she wandered into traffic.
Re: (Score:1)
Re: (Score:3)
What landline?
If you care enough about 911 and emergency situations to be reading this article, and you don't have a landline, then that's on you for being irresponsible. People spend more on texting than it costs to have a landline. No excuses.
The monthly cost of a landline is cheap insurance in the event of an emergency. Cell towers go down, fail, become over-congested, and cell phone batteries die.
Re: (Score:1)
Re: (Score:1)
Right, so I can pay $30+ per month for something I might use once in my lifetime, or $20/mo for unlimited text that I [i]will[/i] use every day. You failed elementary school math, didn't you?
(Also factor in the number of LECs (read: Verizon, and AT&T) doing everything they can to do away with POTS, and it's even more a waste of money. FYI, if you have FiOS, your copper loop was removed, and is forever more not an option.)
Re: (Score:2)
In FIOS areas, it's no longer possible to get a POTS landline. You can get a phone service over FIOS, but it's subject to wall-power being available, and you're using the same E-911 system as normal VoIP or cell phone services, anyhow. It's the FCC that's to blame for me not having a landline.
Also, there's no reason cellular 911 service shouldn't be ultra-reliable. There are 4 different nationwide carriers in the US. What are the odds that all 4 of them will have ALL their overlapping cell towers in an
Re: (Score:2)
Also, there's no reason cellular 911 service shouldn't be ultra-reliable.
http://www.mychamplainvalley.c... [mychamplainvalley.com]
What are the odds that all 4 of them will have ALL their overlapping cell towers in an area knocked-out?
What are the odds your family isn't all on a single cellular carrier, making you unable to take advantage of such redundancy?
Re: (Score:2)
Verizon and Sprint are compatible, while AT&T and T-Mobile are compatible. And with them all switching to LTE, it's likely they will all be mutually compatible in a few more years, when manufacturers start selling multi-band LTE phones.
Most every post-paid cellular plan includes voice roaming. Even if you're not paying for roaming normally, when you dial 911, all restrictions ar
Re: (Score:2)
In FIOS areas, it's no longer possible to get a POTS landline.
Hmmm. My home, built in 2001, originally had a Verizon landline. FIOS has been available here for several years, but I never signed up for that (we use cable for everything else). When our power drops out, the old landline still works just fine. Are you saying that just new installations are affected?
Re: (Score:3)
Have you ever actually called those?
They say "If this is an emergency, please hang up and dial 911".
Re: (Score:2)
use your brain, you have no point
Re: (Score:2)
I think I specifically have a point that they're not a substitute for emergency calls.
Re: (Score:1)
Re: (Score:1)
whats a landline. also, you can just google the numbers or ask siri.
Re: (Score:1)
whats a landline.
If you care enough about 911 and emergency situations to be reading this article, and you don't have a landline, then that's on you for being irresponsible. People spend more on texting than it costs to have a landline. No excuses.
The monthly cost of a landline is cheap insurance in the event of an emergency. Cell towers go down, fail, become over-congested, and cell phone batteries die.
Re: (Score:2)
Re: (Score:1)
30 years ago, MAYBE. Today, "0" could be answered by a call center anywhere. Your call a) might not be answered by a human at all, or b) might not be answered by someone on the same continent.
Re: (Score:2)
Re: (Score:1)
What is this land line you speak of?
If you care enough about 911 and emergency situations to be reading this article, and you don't have a landline, then that's on you for being irresponsible. People spend more on texting than it costs to have a landline. No excuses.
The monthly cost of a landline is cheap insurance in the event of an emergency. Cell towers go down, fail, become over-congested, and cell phone batteries die.
Re: (Score:2)
Not around here. I'm paying about $40 per month for a nearly bare-bones land line (only Caller ID). Even if I were on a $0.35 per text plan, I'd spend more money on that land line every month than I would on texting for ten years. Cheap, it ain't.
Re: (Score:1)
If you care enough about 911 and emergency situations to be reading this article, and you don't have a landline, then that's on you for being irresponsible. People spend more on texting than it costs to have a landline.
Every line in the US is required by law to be able to dial 911, even if you aren't paying for any service at all. This applies to landlines and cellphones. I often keep the police scanner on as background noise, and I can say at least a quarter of the 911 calls in Memphis originate from disconnected cell phones. If your landline gets a dial tone or your cellphone is charged and has a signal, you can dial 911 whether you have a phone plan or not.
Of course that doesn't help in the event of a 911 outage but th
Re: (Score:2)
have your local police and fire phone numbers in your cell phone and posted next to your land line.
That is a great idea.
But, I used to handle 911 outages. Most 911 outages are due to cable cuts, which would often leave those facilities unreachable as well.
I'd say that if your phone works, and you can't call 911 or the local hospital, you should assume the trunk leading to those services (foolishly all usually located next to each other) is cut or damaged. So your next best bet would be to call a NON-LOCAL ER. i.e. Call the next town over. Just because downtown is broken doesn't mean the trunk leading to
Re: (Score:2)
What if one travels a lot in USA? 911 does local ones. Can 411 go to 911?
Why 40 millions? (Score:1)
40 millions doesn't seem to cross any boundary?
Re: (Score:2, Insightful)
Probably crosses a licensing boundary. Beats me!
"We're sorry, your 911 call centre didn't pay their software licensing fee this month. Please call 1-800-RU-LEGIT and report this instance."
Re: (Score:2)
The word "architecture" is bandied around a lot, partly because it sounds so important. But if architecture means anything, it should include scoping out ALL limits embedded in the software or adjustable through a UI. At the very least the limits should be documented in such a way that those responsible for managing and maintaining the system are fully aware of them at all times. Because they are just as important as the speed at which your car will come off the road when you drive round a tight bend.
Ideall
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
I think I found the offending code:
// ought to be enough
static unsigned int counter[640]
void countcall() { // skip the ones we filled up
int i;
for (i = 0; counter[i] = 65535; i++) {}
counter[i]++;
}
killbots have a preset kill limit (Score:1)
You see, killbots have a preset kill limit. Knowing their weakness, I sent wave after wave of my own men at them until they reached their limit
Re: (Score:2)
Wait, what?
OMG, we ran out of numbers (Score:2)
sounds like some dummy read a best practices and conserve the resources book and made a column an interger data type instead of big interger. or whatever the corresponding names are for Oracle or non-MSSQL. some auto process or identity column creates the keys and it reached the max amount. and it wasn't set up to use negative numbers either
Comment removed (Score:5, Informative)
possible solution, fallback to voicemail (Score:1)
Thank you for calling Springfield RescuePhone!
if you know the name of the crime being committed, press one!
To choose from a list of felonies press two!
If you are being murdered or are calling from a rotary phone please stay on the line!
Entirely preventable software error (Score:2)
An entirely preventable software error was responsible for causing 911 service to drop. "It could have been prevented. But it was not,"
So, let us be clear. The error, was not simply preventable but absolutely and completely preventable in all cases. There was no impediment to prevent it. Its prevention was not only possible but also within the reach of any error prevention effort or action. It could have been prevented.
The preventability of the error was absolute. No situation, fictive or factual, in this or other world, would allow a situation in which this error was not preventable.
Finally, it's important to note that the eventual series
Re: (Score:2)
All kidding aside, when I read that line I wondered what type of software error isn't preventable. There's things that are easily preventable and should be thought of, but ANYTHING is "entirely preventable".
Re: (Score:2)
Bad wording. Perhaps "obvious and preventable" would have been better (and why does the counter not go to at least max int, e.g. 2^31-1 = 2,147,483,647).
preventable software error (Score:2)
the FCC found that an entirely preventable software error was responsible for causing 911 service to drop
As opposed to what? An entirely unpreventable software error? Sounds more like a configuration issue than a software error anyway.
Failure started at the Administrative level.... (Score:3)
It's the fault of the administrators to begin with. I am friends with one of the technical advisors for the midwest EOC and the problem is that the administrators dont know their ass from a hole in the ground and ignore their tech guys and listen to the vendors.
He has been screaming for all call centers to have analog failover, but the administrators refuse to hear it.
So who is to blame for the failures? That top moron of Homeland security. IT would have been in place if he would realize that he is not an expert and to actually LISTEN to the experts in the field.
Re: (Score:1)
I'm not convinced you need an analogue failover but you do need fully duplicated systems right down to the power subsystems and cables which you periodically switch between. There is no point having a backup if you don't use it on a regular schedule to be sure it is working properly.
The solutions are not all technical, you have to be monitoring them properly with the right people who are motivated and properly trained. You also need the proper organisational processes .
I've seen NOCs on emergency service n
Re: (Score:2)
I've been specifying emergency service systems for over 10 years, duplication, monitoring, management and processes are always at the top of the list.
Re: (Score:2)
10-1 to that good buddy, got me a smoky on my six, what's your 20 come back?
Typical government waste and inefficiency. (Score:2)
If a private railroad owns rolling stock that would occupy, say 10 miles of
Re: (Score:2)
This is how over built and inefficient government services are.
That was one of the most stupid nonsensical posts I have ever seen here. You calculated the "load factor" based on each of 11,000,000 people instead of on the number of 911 operators.
And of course that's not even counting the fact that 911 services pretty much need to be provisioned to handle *peak* loads, not average (nor even median).
Re: (Score:2)
Re: (Score:2)
I have failed miserably looks like. Even adding the bit about railway rolling stock did not help. Well, that is the problem when you speak with a tongue in the cheek. You end up chewing your own tongue.
That was meant to be tongue in cheek? Oh, OK then ;-)
Problem is, it's election season, and what you said there was really not much different than some of the bullshit that we're inundated with nightly on TV commercials, and flyers in the mail. My favorite so far is the one accusing a Democrat of attempting to "replace Medicare with a completely government-run system". Uhhhmmm, excuse me???
Re: (Score:2)
This will have impacted the outcome of incidents (Score:1)
If calls are lost then help is delayed. This impacts the outcome of incidents.
I'm not saying that people died because of this but I'm absolutely certain that there were some who suffered worse injury and losses because of the delays. Loss of 6,000 calls will result in a lot of hurt.
Like so many other issues, it wasn't a single fault but a chain of events. In this case there was a software failure but the fault monitoring systems and support services failed to immediately note that there were no calls going
I used to handle this... (Score:3)
I used to work in the NOC for a large Telco and we'd handle 911 outages. Usually 911 goes down because the entire networks down. Like the switch failed, or the trunk from one area that leads to the area the 911 center is in would get cut. Most of this stuff is in a ring so there's usually an alternate route, but in some areas that's not physically possible. For example a remote mountain town with a single road in, would likely have its only trunk running along that same road and it'd get cut all the time as the road constantly needed repair. Chose where you live wisely.
We'd handle this in different ways depending on the situation. For example, if we had 4 trunks that could handle 4X number of calls, and 3 got cut so it could only handle 1X, we could actually prioritize certain numbers so 911 and emergency services would get priority. If the trunk leading to the 911 center failed, we could do something like re-route the calls to the local police dispatcher who literally had no warning and would suddenly have their phone ringing off the hook. You may say "you should warn them!" but our policy was "Get it done" because who's dieing while you're arguing with the dispatcher about how her days going to suck?
The most important skill you can have in any NOC is your ability to triage problems. That term comes from the medical world but it's just networking equipment... until you get into the situation I was in. And you're making triage decisions that could actually result in death. These were real engineers that really cared and did what they could. But when you have an area ravaged by hurricane and you tell the tech to put gas in generator 1 instead of 2, because you've been up for 30hrs strait... and a remote goes down so they can't call 911? I just couldn't detach myself from that. I took a pay cut to leave. A lot of people floated through that job, it wasn't just me. It takes a special kind of person that can detach themselves from the consequences of their decisions.
Single Point Failure (Score:2)
For 911 services in 7 states? Set aside issues about the backup system for the moment (which may be a second server in the same data center): Why do all the 911 calls have to funnel through a single system? Emergency services are largely local. Not many people have to make 911 calls across a large region. So why isn't the energency call routing handled by local systems? And calls routed to local service centers?
Even if there was a common software glitch in all the handling systems, I doubt everyone would h
Re: (Score:2)
Actually there are mirrored data centers
OK. That fixes an event that can take one data center out*. But why mirrored? Why aren't these systems distributed and colocated with the municipalities that they serve? And why a globally unique call ID that overflows at 4E7 records?
* I helped develop a system that achieved its reliability through the use of distributed servers. And then had the IT people put all the servers in one rack, in one building. For cost savings, of course. The building sits right on top of the Seattle fault [wikipedia.org]. So you'll have to ex
Okay, I've been doing this for 30-something years (Score:3)
And the number "40,000,000" doesn't come up on my list of "potential overflows to watch out for". What's special about 40 million?
Re: (Score:2)
As near as I can tell from the TFA somebody just put an arbitrary limit of 40 million in the code somewhere. Would be nice to have more technical details.
Re: (Score:1)
I see a suspicious bag full of ebola-laced bombs (Score:1)
At least there are no current / recent worries what might make someone want to call 911 ...
Actually, it's an interesting question -- just what is the threshold? Suspected ebola vomit? Suspicious bag on suburban street? Seems like it would be a very easy system to game, or even to unintentionally render useless. Takes a lot of goodwill and good behavior, all around.
I took a CPR class last night, and the instructor (a firefighter in his dayjob) basically encouraged people to use 911 more, even for things abou
Re: (Score:2)
That's just about the most ridiculous thing I've heard here in a while.
Re: (Score:2)
Flexibility is proportional to complexity, and inversely proportional to reliability/stability. Dedicated hardware device
Re: (Score:2)
Don't do critical things in hastily-written, poorly designed software. Instead, take sufficient time and make the design and implementation robust. Tried and tested methods exist for all of this. (Consider avionics, for example).