Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Bug

Comair Done In by 16-Bit Counter 441

Gogo Dodo writes "According to the Cincinnati Post, the Comair system crash was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...
This discussion has been archived. No new comments can be posted.

Comair Done In by 16-Bit Counter

Comments Filter:
  • by Vengeance ( 46019 ) on Thursday December 30, 2004 @10:00AM (#11218222)
    I believe this will answer your question:

    Tom Carter, a computer consultant with Clover Link Systems of Los Angeles, said the application has a hard limit of 32,000 changes in a single month.

    "This probably seemed like plenty to the designers, but when the storms hit last week, they caused many, many crew reassignments, and the value of 32,000 was exceeded," he said.


    So it sounds like a signed int.
  • by EvilStein ( 414640 ) <spamNO@SPAMpbp.net> on Thursday December 30, 2004 @10:02AM (#11218238)
    Here's the original post [neohapsis.com]:

    Hi,

    On Christmas Day last Saturday, Comair Airlines had to completely stop
    flying
    all of its planes due to computer problems. Comair blamed the computer
    problems on their pilot scheduling software being overloaded after bad
    weather earlier in the week forced many flights to be rescheduled. Comair
    now hopes to have all of its 1,100 daily flights restored by tomorrow.

    An article which was published today at the Cincinnati Post Web site
    provides some interesting details of a software failure in Comair's pilot
    scheduling software:

    How it happened
    http://www.cincypost.com/2004/12/28/comp12-28-2004 .html

    According to the article, Comair is running a 15-year old scheduling
    software package from SBS International (www.sbsint.com). The software has
    a hard limit of 32,000 schedule changes per month. With all of the bad
    weather last week, Comair apparently hit this limit and then was unable to
    assign pilots to planes.

    It sounds like 16-bit integers are being used in the SBS International
    scheduling software to identify transactions. Given that the software is 15
    years old, this design decision perhaps was made to save on memory usage.
    In retrospect, 16-bit integers were probably not a good choice.

    An anonymous message posted to Slashdot the day after Christmas first
    described the software failure at Comair:

    http://slashdot.org/comments.pl?sid=134005&cid=111 85556

    Earlier this year, an overflow of a 32-bit counter in Windows shut down air
    traffic control over southern California for 3 hours:

    Microsoft server crash nearly causes 800-plane pile-up
    http://www.techworld.com/opsys/news/index.cfm?News ID=2275

    This problem occurred because of a known design flaw in older versions of
    Windows:

    http://tinyurl.com/5n9gc

    Richard M. Smith
    http://www.ComputerBytesMan.com

  • by Anonymous Coward on Thursday December 30, 2004 @10:18AM (#11218335)
    Having once done tech support for the Maestro program used by Comair (and other scheduling software for other airlines as well), I think the software is junk. The employees undoubtedly said "I told you so!" when it broke, because they hated it as much as the support team did. IMO the airline didn't bother upgrading because they didn't think the old version was broken enough or outdated enough to warrant it.
  • by Remlik ( 654872 ) on Thursday December 30, 2004 @10:22AM (#11218371) Homepage
    bet *now* they'll upgrade, but until this particularly hairy situation arose, they didn't really see a need to upgrade a computer scheduling system that had been working great for them.

    RTFA RTFA RTFA - The new system goes live in January. Good god its like herding cats around here.

    Gotta love /. when you can get moded +5 insightful without RTFA AND posting verbal vomit....
  • Maestro sucks. (Score:4, Informative)

    by Anonymous Coward on Thursday December 30, 2004 @10:23AM (#11218373)
    Maybe Maestro should just die. My friend is a flight attendant for Southwest and has to use Maestro to plan her schedule. To use it she has to citrix into their main server and wait for an open client (I assume they have either a license or horrible programming restriction on concurrent users). On the very day that the new schedules are posted, it can take hours to log in. It's a joke.

    This stuff could be handled by a team of a dozen web based programmers (Java? C? ASP? LAMP? You pick.) in a few months. It's not difficult.
  • Comment removed (Score:3, Informative)

    by account_deleted ( 4530225 ) on Thursday December 30, 2004 @10:34AM (#11218449)
    Comment removed based on user account deletion
  • by afidel ( 530433 ) on Thursday December 30, 2004 @10:38AM (#11218482)
    They HAD outgrown their current system, and they knew it. That's why the new system was scheduled to go online in the next couple months. Unfortunatly they met with a perfect storm of problems just at the wrong time. If you've ever worked with retail you know that NOTHING gets changed from mid November to early January unless god and the CEO both say it has to be so, I imagine airlines are pretty much the same. Heck airlines probably have an even larger freeze window since few people book flights at the last minute for holiday travel.
  • Re:Forget Y2k... (Score:4, Informative)

    by stupidfoo ( 836212 ) on Thursday December 30, 2004 @10:39AM (#11218483)
    RTFA

    It was a signed integer. The problem occured at 2^15 (32768) (although the article reported it as 32,000)
  • by aoty ( 533561 ) <aotyNO@SPAMyahoo.com> on Thursday December 30, 2004 @10:53AM (#11218582)
    My wife works for Comair here in Cincinnati. The computer system under discussion was in the process of being upgraded prior to the crash. Comair's IT recognized weaknesses in the current system some time ago. The upgrade just happened to be taking a little longer than anticipated. Timing is a bitch, isn't it?
  • by imsabbel ( 611519 ) on Thursday December 30, 2004 @10:55AM (#11218596)
    200$ for 4MB? Thas more 1994 than 1989...
  • by rat_love_cat ( 844761 ) on Thursday December 30, 2004 @11:19AM (#11218807)
    Having fixed point numbers default to unsigned is not a good idea because, at least with C's unsigned rules, it's easy to end up with a huge number if a negative number is ever generated. This has bitten me enough times in C that I avoid unsigned unless I'm damn sure that it can never go negative (and even then I check all subtractions real carefully).
  • Re:Comair? (Score:2, Informative)

    by AceyMan ( 199978 ) on Thursday December 30, 2004 @12:55PM (#11219708)
    mod parent down.

    One poster already noted; The wholly-owned carriers fly different equipment and are staffed by pilots who are members of a different seniority force.

    Moreover, typically the crew tracking system is integrated with the flight operations/dispatch system, and the maintenance control system, and the route planning system, and the trip optimization system. You wouldn't want to try to integrate all those functions into the parent carriers system unless you *had* to.

    Finally, CFR 14 Part 121 says that each certificated carrier has to have their own dispatchers on staff. Comair, et al, are technically independant carriers -- they have their own certificate (DOT license to run an airline), and therefore have to staff their own flight operations (dispatch) office.

    Therefore, Comair cannot integrate their staff with Delta's, even if they wanted to. Of course, that doensn't mean they couldn't still use Delta's operations software, but it just shows how separate the airlines actually must operate -- making the advantage of merging systems specious at best.

    \FAA licensed aircraft dispatcher
  • by Anonymous Coward on Thursday December 30, 2004 @02:47PM (#11220799)
    I don't think many people know, but 1/3 of Pittsburgh flight attendants called in sick on Christmas Day. That alone is a cataclysmic problem since that is a main USAir hub.

    Maybe USAir doens't want anyone to know how much they suck as an employer aside from how they suck as a corporation. Their employees don't seem to ever smile - they probably have no reason to.

    As a side note, the JetBlue CEO has been know to frequent flights handing out cookies, talking to passengers, perform baggage handling, and cleaning up trash after the flight. Every single employee looks happy...
  • Re:SIGNED 16-bit!! (Score:3, Informative)

    by pclminion ( 145572 ) on Thursday December 30, 2004 @02:50PM (#11220830)
    Using a signed integer allows you to distinguish between error and non-error conditions. In UNIX for example, system calls return a negative value on error, so these calls often are declared to return a signed int even though the number they return in a non-error condition will always be positive.

    This might be viewed as laziness depending on the cirucmstances. Obviously, it seems weird to waste half of the integer space just so you can return -1 on error, but if you need to report many various error conditions, using negative numbers to do so makes things a little easier because you can just check if the return value was negative in order to detect an error (instead of comparing the return value one by one against each possible error code).

    However, the entire problem is mitigated if you just switch to a slightly different calling convention. If you need a function to return some value which is always positive, but you still want to indicate possible error conditions, forget about using the return value to return the result. Instead pass a variable by pointer or reference, stick the result in there, and return 0 on success or -1 on failure.

    Unfortunately many programmers regard this as ugly, so we're stuck with silly crap like wasting half the integer space just in order to report errors.

"I've seen it. It's rubbish." -- Marvin the Paranoid Android

Working...