Comair Done In by 16-Bit Counter 441
Gogo Dodo writes "According to the Cincinnati Post, the Comair system crash was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...
Re:Signed or unsigned (Score:5, Informative)
Tom Carter, a computer consultant with Clover Link Systems of Los Angeles, said the application has a hard limit of 32,000 changes in a single month.
"This probably seemed like plenty to the designers, but when the storms hit last week, they caused many, many crew reassignments, and the value of 32,000 was exceeded," he said.
So it sounds like a signed int.
Bugtraq covered this as well.. (Score:5, Informative)
Hi,
On Christmas Day last Saturday, Comair Airlines had to completely stop
flying
all of its planes due to computer problems. Comair blamed the computer
problems on their pilot scheduling software being overloaded after bad
weather earlier in the week forced many flights to be rescheduled. Comair
now hopes to have all of its 1,100 daily flights restored by tomorrow.
An article which was published today at the Cincinnati Post Web site
provides some interesting details of a software failure in Comair's pilot
scheduling software:
How it happened
http://www.cincypost.com/2004/12/28/comp12-28-200
According to the article, Comair is running a 15-year old scheduling
software package from SBS International (www.sbsint.com). The software has
a hard limit of 32,000 schedule changes per month. With all of the bad
weather last week, Comair apparently hit this limit and then was unable to
assign pilots to planes.
It sounds like 16-bit integers are being used in the SBS International
scheduling software to identify transactions. Given that the software is 15
years old, this design decision perhaps was made to save on memory usage.
In retrospect, 16-bit integers were probably not a good choice.
An anonymous message posted to Slashdot the day after Christmas first
described the software failure at Comair:
http://slashdot.org/comments.pl?sid=134005&cid=11
Earlier this year, an overflow of a 32-bit counter in Windows shut down air
traffic control over southern California for 3 hours:
Microsoft server crash nearly causes 800-plane pile-up
http://www.techworld.com/opsys/news/index.cfm?New
This problem occurred because of a known design flaw in older versions of
Windows:
http://tinyurl.com/5n9gc
Richard M. Smith
http://www.ComputerBytesMan.com
Once did IT support for Comair (Score:3, Informative)
Re:Maybe it had "worked just fine" for them? (Score:5, Informative)
RTFA RTFA RTFA - The new system goes live in January. Good god its like herding cats around here.
Gotta love
Maestro sucks. (Score:4, Informative)
This stuff could be handled by a team of a dozen web based programmers (Java? C? ASP? LAMP? You pick.) in a few months. It's not difficult.
Comment removed (Score:3, Informative)
Re:Let's not be too hard.. (Score:3, Informative)
Re:Forget Y2k... (Score:4, Informative)
It was a signed integer. The problem occured at 2^15 (32768) (although the article reported it as 32,000)
Re:Maybe it had "worked just fine" for them? (Score:4, Informative)
Re:Bugtraq covered this as well.. (Score:5, Informative)
Re:Damn you 2s Complement! (Score:2, Informative)
Re:Comair? (Score:2, Informative)
One poster already noted; The wholly-owned carriers fly different equipment and are staffed by pilots who are members of a different seniority force.
Moreover, typically the crew tracking system is integrated with the flight operations/dispatch system, and the maintenance control system, and the route planning system, and the trip optimization system. You wouldn't want to try to integrate all those functions into the parent carriers system unless you *had* to.
Finally, CFR 14 Part 121 says that each certificated carrier has to have their own dispatchers on staff. Comair, et al, are technically independant carriers -- they have their own certificate (DOT license to run an airline), and therefore have to staff their own flight operations (dispatch) office.
Therefore, Comair cannot integrate their staff with Delta's, even if they wanted to. Of course, that doensn't mean they couldn't still use Delta's operations software, but it just shows how separate the airlines actually must operate -- making the advantage of merging systems specious at best.
\FAA licensed aircraft dispatcher
unreleased tidbit of information (Score:1, Informative)
Maybe USAir doens't want anyone to know how much they suck as an employer aside from how they suck as a corporation. Their employees don't seem to ever smile - they probably have no reason to.
As a side note, the JetBlue CEO has been know to frequent flights handing out cookies, talking to passengers, perform baggage handling, and cleaning up trash after the flight. Every single employee looks happy...
Re:SIGNED 16-bit!! (Score:3, Informative)
This might be viewed as laziness depending on the cirucmstances. Obviously, it seems weird to waste half of the integer space just so you can return -1 on error, but if you need to report many various error conditions, using negative numbers to do so makes things a little easier because you can just check if the return value was negative in order to detect an error (instead of comparing the return value one by one against each possible error code).
However, the entire problem is mitigated if you just switch to a slightly different calling convention. If you need a function to return some value which is always positive, but you still want to indicate possible error conditions, forget about using the return value to return the result. Instead pass a variable by pointer or reference, stick the result in there, and return 0 on success or -1 on failure.
Unfortunately many programmers regard this as ugly, so we're stuck with silly crap like wasting half the integer space just in order to report errors.