Comair System Crashes; Passengers Stranded 398

Posted by timothy on Sunday December 26, 2004 @07:08AM from the for-very-high-values-of-too-many dept.

Broerman writes "30,000 people have had their flights cancelled by Comair this weekend thanks to a computer system shutdown. It appears that due to weather and other problems that flights began to be cancelled on Thursday and the backlog choked the system. 1,100 flights have been cancelled so far, including all flights through 12/26. Does anyone know what platform their system was based on? What kind of system just totally crashes? The official statement is that 'There was a cumulative effect with the canceled flights and trying to get crew assigned that caused the system to be overwhelmed.' It seems highly improbable that a system would crash because it had too many reservations. The system should only be able to hold as many reservations as it has flights/seats. It would seem that it's more likely that the system was overloaded with use and that caused a meltdown. When you add in the problems experienced by US Airways, this hasn't been a Merry Christmas for many."

This discussion has been archived. No new comments can be posted.

Comair System Crashes; Passengers Stranded

Load All Comments

Search 398 Comments Log In/Create an Account

Comments Filter:

Fire away! (Score:5, Funny)

by weeksie ( 634500 ) writes: on Sunday December 26, 2004 @07:11AM (#11184223)

Anybody know what they were running? I'd like to see this flamewar get started as soon as possible.

Share
twitter facebook
- It doesn't matter... (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  They're a bunch of incompetent boobs. The news keeps reporting on a "computer glitch" or a "computer malfunction". That's bullshit. This happened because some human(s) fucked up.
- Re:Fire away! (Score:5, Insightful)
  
  by mirko ( 198274 ) writes: on Sunday December 26, 2004 @07:16AM (#11184237) Journal
  
  There recently was a big card problem here, in Europe.
  It did not come from a peculiar OS but just because a partition got filled by index tablespace extents.
  So, it could just be that they ran out of place and it froze the whole application.
  
  Parent Share
  twitter facebook
  - Re:Fire away! (Score:4, Interesting)
    
    by Deviate_X ( 578495 ) writes: on Sunday December 26, 2004 @10:22AM (#11184616)
    
    Interesting...
    Job postings might give some insight: Comair, Inc. jobs [yahoo.com] into what they are using.
    
    Parent Share
    twitter facebook
    - Re:Fire away! (Score:3, Insightful)
      
      by theonetruekeebler ( 60888 ) writes:
      
      Based on those postings, I'm guessing the application is based on either Oracle or Sybase on HP-UX.
      My preliminary diagnosis: blown rollback segment. With too many flights being cancelled, the simultaneous rescheduling of all those crew resulted in a SQL transaction that exceeded the size of what the DBMS could undo. So an uncommitted statement failed and the application code either was not prepared for such a possibility or could only handle it by timing out. Scheduling tasks could no longer move forwa
      - Re:Fire away! (Score:3, Interesting)
        
        by pVoid ( 607584 ) writes:
        
        I don't think they keep a SQL transaction running for as long as the flight hasn't taken off.
        SQL transactions generally last seconds and involve operations like "open tr, is there space in this flight?, reserve space, close tr". Not "open tr, wait for flight to fill up, close tr". Rescheduling or canceling flights probably isn't accomplished using transactions: it's application level logic.
        My personal diagnosis: I think it has nothing to do with the backlog, and that the system just melted under high s
        
        Re:Fire away! (Score:3, Insightful)
        
        by pVoid ( 607584 ) writes:
        
        Do you not know what a rollback segment is? It's what makes you run out of disk space while updating a table 1300 times larger than you thought it would be
        Yes, but you pretty much spelled out what my point was in that the n^2 complexity issue is unrelated to transactional operations. That is, a transaction is a transaction, it is scalable, so it doesn't matter whether the actual operation for computing stuff is O(n^2), the transaction is still a fixed cost. On a side point: I don't agree that because the
- Re:Fire away! (Score:5, Informative)
  
  by [Xorian] ( 112258 ) writes: on Sunday December 26, 2004 @01:56PM (#11185556)
  
  Someone from Comair (who shall remain anonymous) provided me with some details whch people here would be interested in:
  
  The computer system in question runs AIX. The box itself is still up and running just fine; this is purely an application error. This application was not written in-house at Comair, but by another large aerospace company -- SBS (http://www.sbsint.com/ [sbsint.com], owned by Boeing.) This bit of software does not use an external database, it tracks everything itself. It is a dedicated system responsible only for flight crew assignments. (The blather in the original submission about passenger reservations is way off-base. Those functions are handled by a completely different system.)
  
  The great majority of Comair's traffic flows through the midwest, and the central base of operations is in Cincinnati. The midwest was hit by a major snowstorm this week, causing many, many crew reassignments. It appears right now that the application in question has a hard limit of 32,000 changes per month (ouch). Consider that Comair runs 1,100 flights a day and there are usually 3 crew members on each aircraft. A big storm like this can cause problems for days after the snow stops falling. That's a whole lot of crew changes.
  
  In Comair's defense, this has never happened before and is unlikely to happen again. The crew system was already on the chopping block long before this incident, with its replacement scheduled to go live in January. If this freak storm had happened a month later, this likely never would have occurred.
  
  Parent Share
  twitter facebook
  - Re:Fire away! (Score:5, Informative)
    
    by [Xorian] ( 112258 ) writes: on Sunday December 26, 2004 @02:21PM (#11185663)
    
    Just to be absolutely clear: I've only ever communicated with this person on-line, and I can't verify who they are in real life or that they actually work for Comair. It seemed credible though, and it seemed worth posting to de-bunk the slashdot knee-jerk reaction of blaming Microsoft. To me, an application using a 16-bit integer for something seems like a very likely explanation.
    
    Parent Share
    twitter facebook
  - Yep, you are right! (Score:5, Informative)
    
    by Anonymous Coward writes: on Sunday December 26, 2004 @02:41PM (#11185757)
    
    Your statements are accurate.
    
    I was a unix sys admin there, but left for greener pastures during the dot-com craze. The non-redundant hardware at the time ran AIX, and had a great support contract from IBM. The SBS application however, always had monthly issues, at least at that airline. They were looking for a replacement then, and I'm not suprised they still haven't replaced it.
    
    Parent Share
    twitter facebook
  - Re:Fire away! (Score:5, Informative)
    
    by Anonymous Coward writes: on Sunday December 26, 2004 @02:48PM (#11185790)
    If it was the crew scheduling system, and it was SBS's Maestro Crew scheduling system, I can fill in some details.
    Maestro is delivered on AIX, uses a rather old version of Informix for it's database, and is tied together using the TUXEDO TP monitor from BEA.
    The business logic is written in C, and abstracted away using Tuxedo.
    In the case of a major schedule disruption, this program isn't responsible for "solving" the problem, but is responsible as being the system of record for holding the new crew schedule.
    My guess is that the changes to the crew schedule were large enough that some piece of the system was overwhelmed. ( For example, a transaction that was too large and overran the rollback buffers in Informix ).
    Without the system of record in place, a manual process would be very difficult. You would have to figure out:
    
    Which crews where in which locations
    
    What aircraft each crew member was qualified on.
    
    How long they had flown already that day. ( Legalities about how much time you can fly before you need mandatory rest )
    
    Which routes to send those crews on
    
    How to get the crews back to a specific city to run the next day's schedule
    
    Of course, any mistakes you made doing this manually would overflow into other systems. For example, you might send an aircraft that's due maintenance to a city with no maintenance facilities.
    Also, for those that were critical of the system not being highly availble...this doesn't sound like the kind of problem that HACMP and replicated databases would have helped. The hot standby would have choked at the exact same point.
    Parent Share
    twitter facebook
    - Re:Fire away! (Score:5, Informative)
      
      by Anonymous Coward writes: on Sunday December 26, 2004 @04:24PM (#11186184)
      
      No. It is the version of SBS that pre-dated Maestro. It was brought into Comair in the early 1980's. It's written in FORTRAN and uses whatever record managment system that came with the compiler.
      As such it used some very interesting data representations. For example, it tracked time using julian minutes. There are 44640 minutes in a 31 day month. That's small enough to fit in a 16-bit unsigned variable. This approach, nearly taboo by modern standards, was a God-send during Y2K. The system never needed to know what year it was. It became the running wisecrack, "You can't have a Y2K problem if you don't have a 'Y'".
      The Aircraft to Flight assignments is another system [sita.aero], but the two share information.
      
      Parent Share
      twitter facebook
  - Re:Fire away! (Score:5, Informative)
    
    by Daa ( 9883 ) writes: on Sunday December 26, 2004 @03:41PM (#11186021) Homepage
    
    just to give you an idea, here is the applicable FAA reg for crew scheduling, and the pilots contract may have additional terms that must be met.
    
    121.471 Flight time limitations and rest requirements: All flight crewmembers.
    top
    
    (a) No certificate holder conducting domestic operations may schedule any flight crewmember and no flight crewmember may accept an assignment for flight time in scheduled air transportation or in other commercial flying if that crewmember's total flight time in all commercial flying will exceed--
    
    (1) 1,000 hours in any calendar year;
    
    (2) 100 hours in any calendar month;
    
    (3) 30 hours in any 7 consecutive days;
    
    (4) 8 hours between required rest periods.
    
    (b) Except as provided in paragraph (c) of this section, no certificate holder conducting domestic operations may schedule a flight crewmember and no flight crewmember may accept an assignment for flight time during the 24 consecutive hours preceding the scheduled completion of any flight segment without a scheduled rest period during that 24 hours of at least the following:
    
    (1) 9 consecutive hours of rest for less than 8 hours of scheduled flight time.
    
    (2) 10 consecutive hours of rest for 8 or more but less than 9 hours of scheduled flight time.
    
    (3) 11 consecutive hours of rest for 9 or more hours of scheduled flight time.
    
    (c) A certificate holder may schedule a flight crewmember for less than the rest required in paragraph (b) of this section or may reduce a scheduled rest under the following conditions:
    
    (1) A rest required under paragraph (b)(1) of this section may be scheduled for or reduced to a minimum of 8 hours if the flight crewmember is given a rest period of at least 10 hours that must begin no later than 24 hours after the commencement of the reduced rest period.
    
    (2) A rest required under paragraph (b)(2) of this section may be scheduled for or reduced to a minimum of 8 hours if the flight crewmember is given a rest period of at least 11 hours that must begin no later than 24 hours after the commencement of the reduced rest period.
    
    (3) A rest required under paragraph (b)(3) of this section may be scheduled for or reduced to a minimum of 9 hours if the flight crewmember is given a rest period of at least 12 hours that must begin no later than 24 hours after the commencement of the reduced rest period.
    
    (4) No certificate holder may assign, nor may any flight crewmember perform any flight time with the certificate holder unless the flight crewmember has had at least the minimum rest required under this paragraph.
    
    (d) Each certificate holder conducting domestic operations shall relieve each flight crewmember engaged in scheduled air transportation from all further duty for at least 24 consecutive hours during any 7 consecutive days.
    
    (e) No certificate holder conducting domestic operations may assign any flight crewmember and no flight crewmember may accept assignment to any duty with the air carrier during any required rest period.
    
    (f) Time spent in transportation, not local in character, that a certificate holder requires of a flight crewmember and provides to transport the crewmember to an airport at which he is to serve on a flight as a crewmember, or from an airport at which he was relieved from duty to return to his home station, is not considered part of a rest period.
    
    (g) A flight crewmember is not considered to be scheduled for flight time in excess of flight time limitations if the flights to which he is assigned are scheduled and normally terminate within the limitations, but due to circumstances beyond the control of the certificate holder (such as adverse weather conditions), are not at the time of departure expected to reach their destination within the scheduled time.
    
    Parent Share
    twitter facebook
  - - Re:AIX? (Score:3, Funny)
      
      by Zeinfeld ( 263942 ) writes:
      
      While AIX, "Ain't unIX", might be described as Unix and the advert looks like HR drool, I'd still wager that some thing M$ failed something Sybase and that the AIX rumor is someone blowing smoke up your ass. Comparing reputations, AIX vrs. M$, the choice is clear.
      So lets think this one through for a second. The people who work there say the system that failled runs on AIX and that its the application thats gone whoopsie. So they obviously must be lying since everyone knows that the minute an application
Happens all the time... (Score:5, Interesting)

by Anonymous Coward writes: on Sunday December 26, 2004 @07:12AM (#11184224)

When I lived in Chicago, they would lose their radar system on what seemed like a strong wind. And I got stuck in Denver overnight once because the computer system they use to calculate the weight of departing flights crashed. I have a feeling these kinds of crashes are much more common than most people think.

Share
twitter facebook
- Re:Happens all the time... (Score:5, Informative)
  
  by hughk ( 248126 ) writes: on Sunday December 26, 2004 @07:39AM (#11184303) Journal
  
  I have a lot of friends working at a large airline.
  Yes, but it is mostly recoverable. The heavy iron handles things like backend reservations, checkin and cargo. Smaller systems handle things like weight/balance and fuel and PCs are typically used for the front-ends.
  Weight/balance calcs can be done more or less by hand if necessary, however a larger fuel margin is needed. Checkin can be done by hand (you have seen those sticky label systems). However to lose reservations is a major problem.
  
  Parent Share
  twitter facebook
  - Re:Happens all the time... (Score:2)
    
    by dattaway ( 3088 ) writes:
    
    Apparently, this kind of crash is recoverable [dattaway.org], but I wouldn't feel good about it happening.
    - Re:Happens all the time... (Score:3, Funny)
      
      by Rosonowski ( 250492 ) writes:
      
      Wow. I would hate to be the one sitting there when that happened.
      
      There's some...thing on the ... wing
- Re:Happens all the time... (Score:5, Interesting)
  
  by Greyfox ( 87712 ) writes: on Sunday December 26, 2004 @09:07AM (#11184436) Homepage Journal
  
  From looking at the various terminals that the airline people use, I suspect that most of those airline systems are held together with duct tape and library paste and no one really understands how the whole system works anymore. We see that a lot in non-IT industries (And a few IT ones, too.) Of course, the folks using the IBM ones are not ever supposed to go down...
  I moonlighted as an AS/400 operator for a cruise line for a while. We had the system go down once because the janitor turned off the air conditioner in the closet the AS/400 lived in. They didn't dedicate a more secure facility for the computer because the computer wasn't demonstrably central to how the company made money. Turns out they couldn't launch a ship without it. Oops. I suspect that mentality is also prevalent throughout the non-IT industries. They don't know how important their computers are to their business models until those computers die on them.
  
  Parent Share
  twitter facebook
  - Re:Happens all the time... (Score:3, Insightful)
    
    by budgenator ( 254554 ) writes:
    
    Not to hard to imagine, I see a system that's a combination of Fortran 66, cobol, and C all sort of working together over the years. All parts have had numerous patches and changes applied over the years until no one understands it anymore with each interation making the system more fragile. Now they are lucky if they have the source code for the current build.
    Each time the industry is making money and IT is flush a project is started, to examine all the code in the system and refactor and rewrite to modern
  - Re:Happens all the time... (Score:5, Funny)
    
    by TopShelf ( 92521 ) writes: on Sunday December 26, 2004 @11:47AM (#11184940) Homepage Journal
    
    I used to work with a guy who at one time was an HP3000 operator back when those things were as big as your average washer/dryer combo. His shop had about a dozen of these things, and one night he and a buddy were playing frizbee with the circular write-protect rings that were used on the reel-to-reel tape drives.
    
    Sure enough, his buddy whipped one at his head, and as he ducked out of the way, he fell back and by accident hit the power switch located on the back of one of the HP3000's. In an instant, all the ticket terminals for one airline (I can't recall which one) at O'Hare airport went down, prompting a frantic call from VP's wondering what disaster had struck. So who knows what could have happened this time around...
    
    Parent Share
    twitter facebook
  - - Re:Happens all the time... (Score:3, Insightful)
      
      by Greyfox ( 87712 ) writes:
      
      Funny how you never really hear about the applications written in COBOL, Fortran and PL/1 crashing. You get the impression that all those applicatons run for years at a time without so much as a hiccup. It's only with the invasion of GUIs and "modern" design techniques and languages that you start hearing about crashes like this. Granted the newer applications tend to be more ambitious about what they do...
      I'd love to see some uptime numbers for past systems versus the systems we have today. I wonder if t
      - Re:Happens all the time... (Score:3, Interesting)
        
        by HiThere ( 15173 ) * writes:
        
        There were many of them that did, however, crash. But the reason you don't hear about it much is that most of them weren't designed to be running all of the time, but only occasionally. If one crashed (and was a known good program) you'd just re-run it. Frequently that was your only choice, as you might not have anything but the binary. (Sloppy contracts often left consultants with the only copy of the source.)
        
        I did hear of one company that went out of business because their accounting system was writt
Official my arse... (Score:4, Insightful)

by Omicron32 ( 646469 ) writes: on Sunday December 26, 2004 @07:14AM (#11184231)

Sounds like my Mother wrote the official statement. A techy would never report something in that way.

Besides, it's pretty obvious their OS wasn't digitally signed. :p

Share
twitter facebook
- Re:Official my arse... (Score:2, Informative)
  
  by Saven Marek ( 739395 ) writes:
  
  You know I think it was. btw the system being used by Comair?
  
  Its one of SCO's last large scale deployments. You know who to blame now.
  
  Online Anime Gallery's [sharkfire.net]
  - MOD PARENT UP (Score:2)
    
    by johannesg ( 664142 ) writes:
    
    At least, if he is speaking the truth about this ;-)
- System Tracked Crew Location, Not Reservations (Score:5, Informative)
  
  by reallocate ( 142797 ) writes: on Sunday December 26, 2004 @08:38AM (#11184379)
  
  Of course, a techie didn't write the PR release. Who in their right mind would let a techie anywhere near a PR release?
  
  BTW, Comair, a Delta feeder headquartered outside Cincinnati, says the system that crashed was used to monitor crew locations and track working hours to ensure no one went over the legal maximum. Comair says the system crashed as a result of massive crew rescheduling following a record snow in their service area on Wednesday. There is no backup.
  
  Parent Share
  twitter facebook
  - Re:System Tracked Crew Location, Not Reservations (Score:2, Funny)
    
    by Impy the Impiuos Imp ( 442658 ) writes:
    
    Gosh, looks lke idiot programmer assumed a 256 length crew relocation array was big enuf fer anybuddy!
  - Re:System Tracked Crew Location, Not Reservations (Score:5, Funny)
    
    by Pharmboy ( 216950 ) writes: on Sunday December 26, 2004 @09:30AM (#11184506) Journal
    
    You know, I have my OWN reservations about flying on an airline when they have no backups and can't keep their computers from crashing. Whats to keep their planes in the air?
    
    The last thing I want to hear at 30k feet is that my current flight has been cancelled...
    
    Parent Share
    twitter facebook
    - Re:System Tracked Crew Location, Not Reservations (Score:4, Funny)
      
      by shyster ( 245228 ) writes: <[brackett] [at] [ufl.edu]> on Sunday December 26, 2004 @10:54AM (#11184705) Homepage
      
      You know, I have my OWN reservations about flying on an airline when they have no backups and can't keep their computers from crashing. Whats to keep their planes in the air?
      
      The Bernoulli Principle [fiu.edu]. And I don't think computers crashing are going to affect it. This isn't the Matrix, after all.
      
      Parent Share
      twitter facebook
      - Re:System Tracked Crew Location, Not Reservations (Score:3, Interesting)
        
        by Pharmboy ( 216950 ) writes:
        
        I think you are overthinking it. My point is simply that a company that can not be trusted to keep their computers fully functional, can not be trusted to keep their aircraft fully functional. This is based on the premise that it is easier to keep the computers running than the aircraft, which I can easily assume, based upon my own experience.
        
        I also don't eat at diners where the help isn't properly groomed. Same principal: if you can't take of simple stuff, you probably can't take of something more impo
        
        Re:System Tracked Crew Location, Not Reservations (Score:3, Insightful)
        
        by logicnazi ( 169418 ) writes:
        
        Do you also refuse to eat at a relatives house if their computer is virus laden or crash prone? After all if they can't be trusted to keep their computer working why should you trust them to make safe, sanitary food.
        
        Perhaps if computer usage/programming had evolved to the level of personal hygenie, namely routine effort anyone could do would prevent computer crashes, your point would be convincing. However, in practice we realize even the best professional programmers make errors even buffer overflows (a
Someone's gotta say it... (Score:3, Insightful)

by mOoZik ( 698544 ) writes: on Sunday December 26, 2004 @07:15AM (#11184233) Homepage

Yep, it was Windows XP. ;)

I don't know. Frankly, it has less to do with the platform than the custom software that runs on it.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re:Someone's gotta say it... (Score:2)
    
    by mOoZik ( 698544 ) writes:
    
    I agree. Either the system was never thorougly tested or there was a weak link that went undiscovered. In any event, heads should roll, as 30,000 people were affected and it resulted in a lot of lost revenue for the airline.
  - Re:Someone's gotta say it... (Score:2)
    
    by lachlan76 ( 770870 ) writes:
    
    At any rate, I suspect they'll be looking for a new IT director Real Soon.
    
    At what point did the head sysadmin become responsible for finding bugs in the code?
    - - Re:Someone's gotta say it... (Score:4, Insightful)
        
        by Pharmboy ( 216950 ) writes: on Sunday December 26, 2004 @09:35AM (#11184512) Journal
        
        You would think so. The IT Director is respsonsible for making sure everything IT works. Not to do it himself, but to make sure it is done and done right. I can't see how someone can argue with that. Even if it IS the janitor unplugging the UPS to plug in a floor buffer.
        
        Whether it is the cooling system for the computers, the operating system, the applications or simple hardware issues, it HAS to be the IT Director's responsibility. I mean, who the hell else?
        
        Parent Share
        twitter facebook
        
        Re:Someone's gotta say it... (Score:5, Insightful)
        
        by Antique Geekmeister ( 740220 ) writes: on Sunday December 26, 2004 @09:56AM (#11184560)
        
        Occasionally, however, the head IT guy gets over-ridden by management or by available finances. I've been there, saying "we need to spend money on this" and having to make do with much less money, or even with a cut in funding. You need to document the problem in advance to cover your ass, and get it in print and saved offsite to protect yourself from that kind of mistake. I've done that, too. It helped protect me from a nasty lawsuit because I demonstrated where I had told a consulting client, in print, when the systems would start failing and the resulting legal liabilities, and gotten it signed by the company notary.
        
        Parent Share
        twitter facebook
Bringing the /. effect to the weary masses. (Score:2, Funny)

by Anonymous Coward writes:

Linking to their home page will surely help the situation..
My theory? (Score:5, Funny)

by Ckwop ( 707653 ) * writes: on Sunday December 26, 2004 @07:18AM (#11184243) Homepage

The janitor pulled out the plug for the mainframe and used it to drive is floor polisher..

Simon.

Share
twitter facebook
- Re:My theory? (Score:2)
  
  by Zorilla ( 791636 ) writes:
  
  Nerd (Doug): We need the outlet for our rock tumbler.
  Bart & Lisa: PLUG IT IN! PLUG IT IN!
  Nerd (Doug): What, the rock tumbler or the TV?
  Bart & Lisa: THE TV! THE TV!
  
  (Itchy and Scratchy theme plays, Krusty comes back on)
  
  Krusty: WOW! They'll never let us show that one again... never in a million years!
- Re:My theory? (Score:4, Funny)
  
  by bcmm ( 768152 ) writes: on Sunday December 26, 2004 @07:29AM (#11184275)
  
  BOFH excuse #38: secretary plugged hairdryer into UPS.
  
  That is where you got the idea for that post, right?
  
  Parent Share
  twitter facebook
  - Re:My theory? (Score:5, Funny)
    
    by rlauzon ( 770025 ) writes: on Sunday December 26, 2004 @08:45AM (#11184389)
    
    Probably not. It's an old story (quickly retold):
    
    Army base computer going down every night. So the grunt in charge of it stayed the night to see what was happening. When the computers went down, he heard the hum of the floor buffer.
    
    The janitor had plugged his floor buffer into the same power as the computers and it caused the crashes. It was quickly fixed by telling the janitor to not do that and putting locking covers on the power outlets.
    
    But they dreaded telling the base commander what the issue was. So they told him it was "a buffer problem."
    
    Parent Share
    twitter facebook
- Re:My theory? (Score:5, Funny)
  
  by jridley ( 9305 ) writes: on Sunday December 26, 2004 @12:49PM (#11185269)
  
  A friend was sysadmin at a manufacturing plant, and the janitor kept plugging into the power conditioned sockets with a very large, power-hungry floor polisher. He was actually blowing power supplies. Every one cost several thousand dollars in service calls to replace the power supply and downtime.
  
  My friend put "COMPUTER USE ONLY" stickers OVER the power-conditioned sockets. The janitor ripped them off to plug in, and blew another power supply.
  
  My friend finally confronted the janitor, who was a really obstinate PITA. He stood there and said "Yeah, I did it, and I'm gonna keep doing it, and I don't give a damn about you or your fu*kin' computers."
  
  This was a automotive union shop, very difficult to get people fired.
  
  But, in a show of karma rarely witnessed by mortals, the VP of the division was standing within earshot but out of sight. When the janitor finished saying he didn't give a damn that he was costing the company $10,000 a week because he was too lazy to go get an extension cord, the VP walked around the corner and said hi. I don't know whether the guy ran to his car or the VP kicked his ass right over the top of it.
  
  Parent Share
  twitter facebook
stating the obvious (Score:5, Insightful)

by Anonymous Coward writes: on Sunday December 26, 2004 @07:20AM (#11184248)

"Does anyone know what platform their system was based on? What kind of system just totally crashes?"

A stab in the dark here but I'm assuming a system without foresight and redundancy?

Share
twitter facebook
It's obvouis... (Score:3, Funny)

by bcmm ( 768152 ) writes: on Sunday December 26, 2004 @07:20AM (#11184250)

What kind of system just totally crashes?

Oh come on...
That doesn't need answering.

Share
twitter facebook
- Re:It's obvouis... (Score:2)
  
  by Zorilla ( 791636 ) writes:
  
  Oh come on...
  That doesn't need answering.
  
  Damn! We warned them to test KDE 3.3 out before upgrading!
  
  (Ok, so just more obnoxious than anywhere near fatal)
It was running on SCO Unix... (Score:5, Funny)

by bani ( 467531 ) writes: on Sunday December 26, 2004 @07:23AM (#11184257)

They obviously didn't take mcbride's "license or we will have you shut down" threats seriously enough.

Share
twitter facebook
blaming the system can backfire (Score:5, Insightful)

by ext42fs ( 725734 ) writes: on Sunday December 26, 2004 @07:24AM (#11184259) Homepage

It's not the OS, it's the people behind who's to blame. Yes, stupidity and MSW often go together but in a few years one will probably occasionally see a massive linux outage due to... similarly stupid people.

Share
twitter facebook
Scalability and Twelve Step TrustABLE IT (Score:3, Interesting)

by NZheretic ( 23872 ) writes: on Sunday December 26, 2004 @07:24AM (#11184260) Homepage Journal

Sounds like Comair could have used a little virtualized scalability and third party audited builds.
See Twelve Step TrustABLE IT : VLSBs in VDNZs From TBAs [blogspot.com].
and also The ActiveGrid(TM) Grid Application Server [activegrid.com] and Grid Computing [google.com] in general.

Share
twitter facebook
- Re:Scalability and Twelve Step TrustABLE IT (Score:5, Insightful)
  
  by hughk ( 248126 ) writes: on Sunday December 26, 2004 @07:49AM (#11184313) Journal
  
  No, its more difficult in the airline industry. The system by default tries to keep as many planes in the air earning money as possible. If you have an outage which disrupts this choreography, there is a tremendous knock-on effect as passengers/urgent cargo must be rebooked.
  I have seen the major hub for an airline closed because of snow for just a couple of hours in the early morning, but the resulting chaos of rescheduling/rebooking caused the reservations system to crash after just a few minutes of uptime. The same would keep happening after restarts.
  It is normal to test system up to several times normal load, but they were seeing peaks at over 100x. The old, 3270 emulator based system would have slowly got through it but the newer system died.
  
  Parent Share
  twitter facebook
  - Where did the system fail under stress? (Score:2)
    
    by NZheretic ( 23872 ) writes:
    
    Was it the design of the software or the limitations of the hardware? See my post on Scalability on demand and third party servers [slashdot.org].
    - Re:Where did the system fail under stress? (Score:2)
      
      by hughk ( 248126 ) writes:
      
      I do not know the nature of the Comair system, but software design is the major issue with systems that degrade catastrophically rather than gradually. Please remember that major airlines used to run with much slower hardware up to the eighties (indeed, much less processing power than my PDA), however they did have very high I/O throughput and intelligent frontends.
  - Leasing third party servers for stress testing (Score:2)
    
    by NZheretic ( 23872 ) writes:
    
    One more advantage of a Virtualised Standard platform [blogspot.com], would be the ability to do development and stress testing on third party servers. Full on stress testing is something that most organizations cannot afford to do on the currently deployed hardware.
    - Re:Leasing third party servers for stress testing (Score:2)
      
      by hughk ( 248126 ) writes:
      
      The problem is that the daily schedule of an airline is extremely complicated. One issue is that many airlines have downsized their older and more experienced staff so they lack the ability to run the airline without their extensive IT systems. Even with the knowledge, you still need to be able to reschedule slots with the airports as well as new flight plans (also usually filed by computer).
      It is then an issue as to whether you really want to design IT systems for every scenario. It costs a *lot* of mone
  - Re:Scalability and Twelve Step TrustABLE IT (Score:2)
    
    by gl4ss ( 559668 ) writes:
    
    they aren't building them for normal use so why didn't they test it under the chaos that comes when there is downage?
    
    it's not an excuse to miss research on what the system could be hit with.
  - Re:Scalability and Twelve Step TrustABLE IT (Score:2)
    
    by Rebar ( 110559 ) writes:
    
    The old, 3270 emulator based system would have slowly got through it but the newer system died.
    
    Wow, I didn't know that 3270 emulators were even programmable, and surely wouldn't try to base an airline reservation system on them. Seems far better to use something like a mainframe than a grid of terminal emulators, although there must be a few distributed mips there...
- - Scalability on demand and third party servers (Score:2)
    
    by NZheretic ( 23872 ) writes:
    
    First of all, it all depends on what are the bottlenecks in the proccessing of the transactions. That is dictated by the combination of the hardware and network bandwidth and overall design of the existing software system. The worst cases are bottlenecks in the design of the software, where all transactions have to pass some/all data through a single proccess/proccessor. If the problem is just hardware scaleabilty or reliability is the problem then grid/cluster computing can help.
    If you choose a standardize
This is getting a little to common for them. (Score:4, Interesting)

by jhobbs ( 659809 ) writes: on Sunday December 26, 2004 @07:25AM (#11184265)

Back on May 1st of this year Delta's internal traffic monitoring system grounded them worldwide when it was hit by a worm (forget which one). Yours truly was flying that day. I spent 7 hours on a runway in Cleveland. (Talk about adding insult to injury.) Comair is a regional carrier of Detla's. I wonder who handles Delta's IT needs?

Share
twitter facebook
- Re:This is getting a little to common for them. (Score:2)
  
  by sacremon ( 244448 ) writes:
  
  When I was working there as internal tech support about six years ago, there was a dedicated division, named Delta Technology, that did all the development work. That included not only the software that ran on the mainframes and Unix boxen but also the hardware like the ticket scanners that they've implemented in that time. Given how well they knew how to deal with their desktop machines (Win2K Pro boxes), the vast majority of the software developers didn't know squat about Windows. Of course, that doesn
- From old information... (Score:5, Informative)
  
  by gminks ( 734161 ) writes: <gminks@@@ginaminks...com> on Sunday December 26, 2004 @09:51AM (#11184547) Homepage Journal
  
  According to this article [written in 1995] [cio.com], Dell and AT&T created a new company called TransQuest Information Solutions.
  
  This article [20minutesfromhome.com] outlines how this joint venture re-vamped Delta's IT systems (again remember, this is 1995):
  
  During 1995 and 1996, TransQuest reengineered Delta's systems to migrate them from Hitachi mainframes running Natural, Adabas, and DB2 to an open systems environment. The new systems are written in C++ and access Sybase databases of reusable and distributed objects. The systems run primarily on Sun, HP and AT&T servers under UNIX with clients running under UNIX, MS-DOS, and Windows. The clients are connected to the servers over high bandwidth TCP/IP frame relay networks.
  
  Job titles for the company's 1,100 computer professionals include Systems Engineer and Software Engineer 1 through 8. Staff members recently developed an aircraft weight balance system that can be accessed by pilots to determine how luggage and fuel have been distributed within the aircraft for balance during a flight. This system was developed in C++ on AT&T and HP UNIX servers and will be available on 40,000 devices to 2,000 users.
  
  The trail runs dry here, job postings stopped around 2001.
  
  Which really raises suspicions that all the code is written and maintained offshore. The question now becomes who is handling this for Delta.
  
  One of Tata's spinoffs, Airline Financial Support Services [airlinefinancial.com], is described as
  
  "an example of an external service provider that handles a wide range of back-office functions for the airlines. AFS handles sales, refund, traffic and cargo; performs fare audits; manages yields and revenues by performing departure and post-departure processing checks; books crews; deals with overbooked flights and wait-lists; adminsters frequent flyer programs; draws up flight navigation charts; such as landing or route facility charts; and provides customer care." This according to ebstrategy.com [ebstrategy.com]
  
  Wipro handles some of Delta's inbound reservation calls in India and the Phillipines. [wipro.com]
  
  In conclusion, it would appear that either Tata's AFS arm or Wipro do the IT for Delta airlines.
  
  Parent Share
  twitter facebook
  - Re:From old information... (Score:2)
    
    by Kalak ( 260968 ) writes:
    
    Since it appears to be crew scheduling that was the issue, I'd look in the direction of Tata. Thanks for the info.
But management saved 13.7% by hiring H1-Visas (Score:2, Interesting)

by Soyobob ( 843580 ) writes:

Too bad the airline will go bust because of this. But then all airlines lose are loosing billions except for Southwest.
- - Re:It's a tragedy when... (Score:2)
    
    by gl4ss ( 559668 ) writes:
    
    actually disaster management strategy IS optional.
    
    this story as an evidence.
    
    though seriously, there's quite a lot of companies out there that instead of hiring incompetent people could be better off buying the services from outside. outsourcing doesn't necessarely mean it's crap, there's a lot of domestic in-house crap and idiots everywhere.
Crew assigment is a hard problem (Score:5, Informative)

by rsilva ( 128737 ) writes: on Sunday December 26, 2004 @07:59AM (#11184332) Homepage

'There was a cumulative effect with the canceled flights and trying to get crew assigned that caused the system to be overwhelmed.'

I am only trying to make sense out of the above comment from the official statement above.

Crew assigment is a hard problem, it is usually an MILP (Mixed Interger Linear Programming) [anl.gov].

Such problems may be very hard to solve in reasonable time. Maybe (I'm shooting in the dark here) the first delays made the crew assigment problems grow too large for being solved in reasonable time.This would generate a snow ball effect as the assimgment problems would keep on growing maing the system "crash".

We may never know what really happened but this would be a nice example for my classes :-)

Share
twitter facebook
- Re:Crew assigment is a hard problem (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  You can't blame something like this on algorithmic complexity. Finding an optimal solution make require an impractical amount of time, but a workable solution within a few percent of optimal is normally much easier. In the long run, a few percent may mean the difference between life and death for an airline, but you must retain the ability to cope with short-term emergencies, even if it means a lot of scrambling around and some wasted money in the short term. Most businesses can't afford many complete me
- Re:Crew assigment is a hard problem (Score:2)
  
  by sporty ( 27564 ) writes:
  
  Not particularly. You can also use graph theory for this. Once you have a means to convert the data into a graph, and a means to convert something like, vertex colouring back into people assignments, it's as easy as your graph colouring algorithm to be. You can do it either brute force, or via a heuristic in reasonable time with a good enough computer. Even in a huge, huge graph.
  Then it's a matter of feeding the data at a rate so you don't backlog faster than the amount of data you get in.
  If you ar
  - Re:Crew assigment is a hard problem (Score:3)
    
    by sporty ( 27564 ) writes:
    
    Gah, I said vertex colouring when I meant network flow. My bad. :)
- Re:Crew assigment is a hard problem (Score:5, Interesting)
  
  by coyote-san ( 38515 ) writes: on Sunday December 26, 2004 @11:06AM (#11184731)
  
  It's far harder than that alone since you also have to get the aircraft back to the right city (many are in the wrong city due to airport shutdowns due to the weather). Obviously you want to optimize the number of passengers carried along for those flights, but at the same time you'll be "burning" allowed worktime for the crew.
  
  Even worse the crew and aircraft are independent variables. Obviously you need a crew to operate a flight, but the crew may end up in the "wrong" city for the usual schedule. It may be better to leave a plane on the ground and fly its crew "deadhead" to the "right" city than to have them fly a load of passengers to the "wrong" city.
  
  There are reasonably efficient algorithms to solve these problems, but we spent most of my entire second-semester graduate-level algorithms class studying them (network flows). The algorithms most developers would come up (including me after a decade of experience and graduate-level algorithm class) are extremely inefficient and scale horribly.
  
  The bottom line is that it's easy to imagine a system that has no problem with pertubations from the regular schedule but is totally overwhelmed when starting from scratch. I hope the bean counter who saved the company a few bucks by insisting on far more modest hardware gets canned for his costly lack of foresight, but we all know that IT will catch the heat.
  
  Parent Share
  twitter facebook
Can anyone say... (Score:3, Funny)

by carlmenezes ( 204187 ) writes: on Sunday December 26, 2004 @08:17AM (#11184363) Homepage

...slashdotted reservations?

Share
twitter facebook
30,000? (Score:5, Funny)

by __aafkqj3628 ( 596165 ) writes: on Sunday December 26, 2004 @08:22AM (#11184370)

30,000 passengers? Getting dangerously close to an integer overflow there.

Share
twitter facebook
- Re:30,000? (Score:5, Funny)
  
  by edp ( 171151 ) writes: on Sunday December 26, 2004 @12:47PM (#11185262) Homepage
  
  "30,000 passengers? Getting dangerously close to an integer overflow there."
  That is not a bug but an accurate model of reality. When you strand 32,768 passengers, they will turn negative.
  
  Parent Share
  twitter facebook
I don't know about their internal system... (Score:3, Interesting)

by Glowing Fish ( 155236 ) writes: on Sunday December 26, 2004 @08:47AM (#11184395) Homepage

As a preliminary finding that may or may not give us a clue as to what the internet system was running, Netcraft reports that www.comair.com is running Apache on HP-UX.
So don't assume that the internal system was Windows just yet. Then again, don't assume that it wasn't.

Share
twitter facebook
- Re:I don't know about their internal system... (Score:2)
  
  by Zachary Kessin ( 1372 ) writes:
  
  Also don't assume it was the OS that died, it is very posible that the computers were up, just not responding to the client software or otherwise screwed up. I would guess that it was thier own custom software that died.
whole story? (Score:5, Informative)

by confusion ( 14388 ) writes: on Sunday December 26, 2004 @08:51AM (#11184402) Homepage

This comair story is all I'm seeing getting press. I think its a lot bigger than that.
My sister flew Delta on Dec 23rd from Detriot to Atlanta. Plane was 2 hours late, but no big thing. Waited 5 hours for her luggage, with no dice. By the time we got in line for luggage services, there were at least 600 people in the line already.
Talking to other passengers from 10+ different flights from different cities, no one got their luggage that night. Apparently, it wasn't just Atlanta - the local news in Tampa and Detroit had segments on how the airports had taken over parts of taxiways to sort through seas of bags that didn't make it on to planes.
It's been 2 days, and Delta has no idea where the stuff from that flight is. I'm guessing it isn't just Comair that got hit by some computer problems.
Jerry
http://www.syslog.org/ [syslog.org]

Share
twitter facebook
- Re:whole story? (Score:5, Interesting)
  
  by garcia ( 6573 ) * writes: on Sunday December 26, 2004 @09:32AM (#11184510)
  
  Personally I think that Delta was being a bunch of assholes about the whole thing...
  
  Seeing that my 7pm flight was cancelled for the 23rd I spent 20 minutes redialing from two different phones until I got past a busy signal. After 50 minutes on hold I got through to a representative who scheduled me for the 24th's 7pm flight. I spent the rest of the time rearranging time off from work, the dog's time to be spent at the kennel, car rental stuff, and phone calls to my fiance who would meet me at the airport, and to family we were supposed to see.
  
  At 7am on the 24th the flight was already cancelled. At this point I didn't give a shit anymore. Delta was saying I would have to use my tickets by the 15th of January because "it wasn't their fault". I knew it wasn't the fucking weather down there as plenty of people were saying it was fine in the area. So I call again and get through after redialing for 65 minutes. I get through to a rep after 50 more minutes in queue. She tells me she can't do anything but schedule me for the 25th at 7pm so I'd have to get in queue for the reissue desk. Fine...
  
  After 2 hours and 11 minutes in queue (with no hold music or sound for that matter) someone calls on my home line at 5:15pm from Delta to tell me my 7pm flight is cancelled (cute, I would have been at the airport by then). I tell that rep to get me into the reissue queue as I've been on hold with them for 2 hours.
  
  I finally get through and tell them I want my money back. They tell me I need to speak to customer service. After waiting on hold (with the reissue rep) for 25 minutes the reissue rep offers to refund my money.
  
  We can't fly out for New Years as the kennel is booked and I'd feel horrible asking someone to watch our dog in our house for me than 1 night. So basically we have to wait quite some time to fly down there again.
  
  It was a little bit of a pain in the ass to wait on hold and be jerked around for two days for something that was their fault when they continually claimed wasn't. BAD WAY TO TRY AND PLEASE A CUSTOMER.
  
  Thanks for ruining our Christmas.
  
  Parent Share
  twitter facebook
  - Re:whole story? (Score:3, Informative)
    
    by HeghmoH ( 13204 ) writes:
    
    These days, "we hate the customer" seems to be the motto of all of the big airlines.
    
    This summer, I was flying from Paris to Ft. Lauderdale via Philadelphia on USAir. The Paris->Philadelphia leg was handled by the same plane that does USAir's Philadelphia->Paris flight that same day. The incoming flight was about four hours late, so of course our outgoing flight was also four hours late. Sucks, but what can you do.
    
    So we get into Philadelphia at about 9PM instead of 4:30PM and everybody rushes to get
  - - Re:whole story? (Score:3, Informative)
      
      by winwar ( 114053 ) writes:
      
      "I have little sympathy for people that whine about holiday travel when they didn't plan for things like this."
      
      Okay troll, I'll bite. Maybe he had a limited amount of time off. Maybe that was the most convenient time to fly. Whatever. It doesn't matter.
      
      He shouldn't have to plan for weather, high traffic, and/or computer screwups. That is the airlines JOB. You know, the people who took the money and agreed to get him from point A to point B. Bad weather in the winter? From the massive effects it has on th
Not surprising, coming from Comair (Score:5, Interesting)

by Anonymous Coward writes: on Sunday December 26, 2004 @08:54AM (#11184410)

Some of my co-workers are on contract developing Java software for Comair.

Comair are very tied to particular systems, and don't want to change even when the developers have pointed out problems. Case in point: a J2EE-based employee portal, based on Novell exteNd (Novell Portal Service) and a one-way HPUX server. NPS runs in Tomcat, which is servicing requests (via mod_jk) through Apache. No other application shares the machine, and Comair will only consider vertical scaling, not horizontal.

The application creates at least two threads per connection, and when the thread count goes beyond a relatively low threshold (between 300 and 400), Tomcat deadlocks. It's not because they're running out of space in the allocated JVM heap, and they've tuned mod_jk to allow for heavy load. The current solution is to restart Tomcat when the system locks up.

Novell's support has been less than stellar, so the Java contracting group was informally asked what to do. We had all kinds of useful suggestions, from dumping NPS for another portal implementation, to creating custom thread-pools, to using JDK 1.4 new I/O and a minimally-threaded design, and even using round-robin DNS and a group of independent portal servers to share the load. Comair are wedded to particular minimal cost solutions, however, and it shows.

At least when the portal crashes, it only impacts employees and not passengers.

Share
twitter facebook
I'm surprised (Score:2, Interesting)

by antifoidulus ( 807088 ) writes:

that in the name of sensationalism reporters haven't said, "terrorism is probably not to blame but the Dept. of Homeland Security is looking into it." It seems that after Sep. 11th, the news wants to try to connect everything even remotely bad with terrorism, and of course the Dept. of Homeland Security encourages them by using as vague of language as possible. Are people that easily frightened?
- Re:I'm surprised (Score:4, Insightful)
  
  by HangingChad ( 677530 ) writes: on Sunday December 26, 2004 @10:04AM (#11184575) Homepage
  
  It seems that after Sep. 11th, the news wants to try to connect everything even remotely bad with terrorism
  What else do they have to do? They've got this huge ass budget, all those people watching a lot of honest citizens. It was 10 years between the first attempt on the world trade center and the second. We've built and paid for this entire monster agency for an event that might be 10 or 15 years away. What are they going to do in the meantime? Grope women at the airport. They have to do something to justify their existence, Otherwise we'd have admit we over-reacted to 9-11.
  
  Parent Share
  twitter facebook
It's obvious what happened (Score:2, Funny)

by moartea ( 703940 ) writes:

http://home.hccnet.nl/jaap.kranenburg/fun/xx/image s/fun20020415.jpg [hccnet.nl]
No manual process? (Score:2)

by SCHecklerX ( 229973 ) writes:

WTF can't they do it manually? It's just keeping track of seats on planes for fsck's sake. Sure, they may not be able to accomodate everyone right away, but they could certainly do better than "nobody can fly at all because our computer system crashed". If a restaurant loses their computer, they don't stop admitting people. They just go back to paper orders/receipts.
- Re:No manual process? (Score:2, Insightful)
  
  by aggles ( 775392 ) * writes:
  
  Hopefully someone from Commair reads /. and will not be able to resist spilling the beans. This sounds like a lawsuit in the making. It was not weather related - it was someone trying to either save a buck by writing crappy software or having poor operational procedures. This is a Sarbanes-Oxley event - and hopefully, the truth will come out about what happened, and why the backup procedures were either not-in-place or did not work. I don't want to see them go bankrupt, but they should be held accountab
seen it before (Score:2)

by sacrilicious ( 316896 ) writes:

It appears that due to weather and other problems that flights began to be cancelled on Thursday and the backlog choked the system. 1,100 flights have been cancelled so far, including all flights through 12/26. Does anyone know what platform their system was based on? What kind of system just totally crashes?
Sounds like Diebold may have been contracted for the job.
I'd like to know (Score:3, Interesting)

by HangingChad ( 677530 ) writes: on Sunday December 26, 2004 @10:22AM (#11184617) Homepage

Not just the database platform and front end but who built it. This just has E-D-S stamped all over it. Everybody has a system go down once in a while, but it just seems like EDS has had more than their share.
This is a worst case scenario for a system of that nature because of so many dependent calculations and calls to other systems. It takes more than just having a plane and a crew...which is a lot of work all by itself. It has to have a gate and connecting flights. Then multiply all that by 30,000 people, roughly 120 plane loads, and complicate it by some airports being closed. I bet you could actually watch the lights get dimmer in the server room. Still when you know the potential peak demand you have reserve capacity. Slow is okay, stop is unacceptable.

Share
twitter facebook
Bailout (Score:2)

by parliboy ( 233658 ) writes:

After 9/11, pretty much all of the domestic airlines were bailed out by the government to keep them from going poof (except for Southwest and a couple of others, who didn't have their heads up their asses). So I just want to know how long it will take this Delta affiliate to plead for money. That not only has it screwed over all of those passengers, the taxpayers will collectively pay for it.
Southwest refuses to drink the Kool-aid (Score:5, Interesting)

by Oswald ( 235719 ) writes: on Sunday December 26, 2004 @11:06AM (#11184730)

This computer problem of Comair's just demonstrates how unworkable the hub-and-spoke system of flight scheduling is. It's a flawed concept, foisted on a naive public by an industry locked in some sort of mass psychosis. In the pursuit of minor economies of scale, the big airlines treat their passengers like packages (hey! it works for Fedex, and their cargo can't even walk itself to the next gate...), treat airport runways and air traffic controllers like unlimited resources, and waste vast amounts of jet fuel. The fact that Southwest Airlines (which does not use a hub-and-spoke scheduling system) is profitable, and the rest of our major airlines are either in, just out of, or about to go into, bankruptcy doesn't seem to dent their thick skulls.
I have watched the operation at Atlanta for over 21 years, and I've seen how cutthroat the competition for a major hub is, but it feels like watching two dogs fight over two bones--you can't tell if they're fighting out of greed or stupidity. Southwest doesn't even fly into Atlanta--they know that only a pyrrhic victory would be possible under those circumstances. Management at the other airlines has been criminally incompetent ever since airline deregulation, but it's the passengers, employees and shareholders who pay the penalty time and again.

Share
twitter facebook
- Re:Southwest refuses to drink the Kool-aid (Score:3, Interesting)
  
  by HR ( 38332 ) writes:
  
  The problem with your analysis is that point-to-point flying doesn't work when you start talking about international travel. It's just not possible to fly passengers to, say, Germany or Japan from every domestic airport. The way you do it is to accumulate passengers at a major hub on the coast and then fly from there.
- Re:Southwest refuses to drink the Kool-aid (Score:4, Insightful)
  
  by PPGMD ( 679725 ) writes: on Sunday December 26, 2004 @11:57AM (#11184992) Journal
  
  It's isn't that easy, for the longest time Southwest was the hardest to book a flight for because they had no web system that could figure out it's route system (only 5 years later they just released one). Up about about July of this year to book a web flight you needed a route map and schedule to figure out what cities you had to go throuh if there was no direct flight option.
  The hub-spoke system is easier to manage, and can be profitable if the airlines relize that they aren't unlimited resources, and decentralize the hubs on a limited basis.
  Anyways Southwest doesn't drink anyone's koolaid, they run all their own in house designed systems (I am not sure they are even on Sabre anymore), including web apps. It's an intresting concept, but it probably causes their IT managers to pull their hair out.
  
  Parent Share
  twitter facebook
- Re:Southwest refuses to drink the Kool-aid (Score:4, Informative)
  
  by Anonymous Coward writes: on Sunday December 26, 2004 @02:31PM (#11185712)
  Actually, the only thing that makes these sort of problems easier for Southwest is the consolidated fleet types. With nothing but 737's, you don't add complexity to the scheduler for things like pilot and f/a qualifications.
  What happened to Comair here could happen to just about any airline. There is no comprehensive suite of software that handles crew scheduling, aircraft scheduling, reservations, and the myriad of other functions that are needed to run an airline.
  Reservations, for other than tiny airlines, are still managed by large TPF mainframes. TPF is a very "bare bones" operating system that runs on IBM mainframes, and was written specifically to deal with high volume / high transaction rate systems. Personally, I've seen 5 attempts at 3 different airlines to replace it with something modern. ( like Unix with an RDBMS ). Each attempt failed miserably, and the airline went back to TPF. Note that TPF is not MVS, OS/390, or any other more mainstream Mainframe OS. It's purpose built.
  Unfortunately, this means that all of the other applications have to interface with TPF via screen scraping. To further compound the problem, no "suites" exist to handle the following functions, so most airlines have to "sew together" best of breed solutions for these basic functions:
  
  Crew Scheduling - F/A's and pilots bid on slots to fly, this system takes those bids and turns it into a schedule.
  
  Aircraft Scheduling - Tracks which tail numbers are flying which flights for the dispatchers
  
  Optimization - Different optimizers to do things like:
  
  Fuel Tankering - Use the jets as "tankers" so that you buy fuel where it's cheapest for flights later in the day
  
  Crew Optimization - "Traveling Salesman" type solver to incur lowest labor cost, get crews back to home base, etc
  
  Schedule Optimization - Use the aircraft in the most cost efficient way to cover all of your scheduled flights.
  
  Maintenence Optimization - Pull aircraft in for Scheduled Maintenance at the optimum time.
  
  Reacommodation - When things go wrong ( weather, mechanicals, whatever, pull in all of the above variables to crank out a new schedule, crewing, mx schedule, etc )
  
  Booking Engines, for the internet and reservations agents
  
  Point of Sale and Boarding functions for agents, skycaps, and kiosks
  
  Interline functions where other airlines sell your tickets, and transfers for bagggage, etc
  
  Anyhow, this list isn't comprehensive, but shows enough of the disparate pieces that you can imagine why these "glitches" happen. Very few of the items from the list above come from the same vendor, or even run on the same platforms.
  Parent Share
  twitter facebook
response from an AA employee (Score:3, Interesting)

by dan_bethe ( 134253 ) writes: <slashdot@smucko[ ]org ['la.' in gap]> on Sunday December 26, 2004 @06:06PM (#11186670)

I sent a summary of these Slashdot comments to my cousin who works at American Airlines hq in Dallas. Here's his response!

---

"ugh... I worked 9pm-1am yesterday (xmas day). I spent the first two
hours of my shift calling people to tell them their flight was
cancelled and reschedule them. Most of them were taking flights out to
Miami and the Caribbean to spend New Years Eve partying on the beach.
Honestly, I had little pity telling them they were going to miss out on
one day of tanning especially since they seem to 'blame' the weather on
us.

"One hour into my shift our reference system went down. No IT people
were willing to come in and fix it. I had the system up for booking
flights and making reservations, but I could not look up any of our
rules and regulations. Ah well, enjoy your xmas off IT guys!! Enjoy
the weather in Cabo San Lucas!! Cheers!!

"Fortunately, we have a backup of all our html files saved as text
files. However each text file can only hold serval hundred text
characters. So, when I want to look up our baggage policies the normal
html file is called BAG INFO. In the backup system BAG INFO is
separated into 10 or 20 text files and I have to 'page' through them by
typing BAG INFO P2, BAG INFO P3, BAG INFO P4. The text files are not
indexed and are not searchable. It took me 10 minutes to find and
advise someone how big a bag they can take to Puerto Rico.

"After I started taking incoming calls again, there were people calling
in on Christmas day to book their trips for Spring Break. There were
over 100 calls on hold to talk to us, and there were people sitting on
hold for half an hour to ask me how much it would cost to book a trip
to Fort Lauderdale in March. Couldn't that wait until the day after
Christmas?

"Yes, the airline industry does not prepare for emergencies as well as
it could for the holidays when people want to travel in record numbers.
However, I think the general public could try to have their own backup
plans in place as well and realize that the travel industry in general
does not have the equipment or the staff to handle everyone in the
country wanting to travel all at once in one week. Do people stock
their refrigerators year round with enough food to feed everyone in
their families at one meal like they do at Christmas?

"Even though we try to accommodate everyone as best as we can on the
holidays, we want to to have a holiday just as bad as the rest of
everyone else. Working in the travel industry should not indenture us
to be your slaves over holidays. The public needs to have a little bit
of compassion and realize how much we give up in our own personal lives
just to help you get where you are going. Frankly, the way most people
treat me on the phones I don't think they deserve our help and
compassion. And don't call on Christmas day to book flights in March.
That phone call is making someone work on a day they shouldn't have to.

"anyways.... heh..... guess i had a bad night at work last night, huh

"MERRY XMAS!"

Share
twitter facebook
- Re:The system runs Linux (Score:2)
  
  by carlmenezes ( 204187 ) writes:
  
  Yeah, but the real cause for the crash was an Access backend. So there! :)
- - Re:The system runs Linux (Score:2)
    
    by bcmm ( 768152 ) writes:
    
    No, it means they didn't make a big enough swap partition.
    - Re:The system runs Linux (Score:2)
      
      by Antique Geekmeister ( 740220 ) writes:
      
      Nah, under Linux you can trivially create new files as swap space when needed. It may mean they overflowed available partition space on critical systems, or were unable to administer a heavily loaded fast enough to add swap before it overflowed.
      
      Knowingn nothing else, I'd guess they overflowed a key database partition. A lot of old programmers very foolishly over-partition available disk, trying to outguess the OS about what partition will need how much space and instead of protecting themselves from disast
- Re:Travel tip (Score:5, Informative)
  
  by xlation ( 228159 ) * writes: on Sunday December 26, 2004 @08:42AM (#11184385)
  
  From: http://www.fly.faa.gov/FAQ/faq.html
  
  The term "Rule 240" refers to a rule that existed before airline deregulation. There is no longer an actual Rule 240. The term, as it is now used, refers to each airlines "conditions of carriage" policy. You would need to contact the airlines to obtain this.
  
  Parent Share
  twitter facebook
- Re:Travel tip (Score:2)
  
  by reallocate ( 142797 ) writes:
  
  True, but...It's Christmas, everyone is booked up, and thousands of flights were already cancelled due to weather.
- - Re:Slashdot this (Score:2)
    
    by Zorilla ( 791636 ) writes:
    
    Don't smirk; it'll be the next topic on Ask Slashdot.
- Read the code, Luke (episode II) (Score:5, Funny)
  
  by chiph ( 523845 ) writes: on Sunday December 26, 2004 @08:59AM (#11184421)
  
  Somewhere deep in the code is a comment that says:
  
  // I don't need to check for this condition because
  // my asshole manager Steve Johnson says it'll
  // never happen
  
  {friggin' slash - When I say plain old text, I mean plain old text!}
  
  Parent Share
  twitter facebook
- Re:Simple Solution (Score:4, Insightful)
  
  by the pickle ( 261584 ) writes: on Sunday December 26, 2004 @05:56PM (#11186615) Homepage
  
  Sure, that's eminently practical. I can take 48 hours to get from Detroit to LA, or I can take six (including travel time and check-in time at both airports).
  
  p
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Fire away! (Score:5, Funny)

It doesn't matter... (Score:2, Insightful)

Re:Fire away! (Score:5, Insightful)

Re:Fire away! (Score:4, Interesting)

Re:Fire away! (Score:3, Insightful)

Re:Fire away! (Score:3, Interesting)

Re:Fire away! (Score:3, Insightful)

Re:Fire away! (Score:5, Informative)

Re:Fire away! (Score:5, Informative)

Yep, you are right! (Score:5, Informative)

Re:Fire away! (Score:5, Informative)

Re:Fire away! (Score:5, Informative)

Re:Fire away! (Score:5, Informative)

Re:AIX? (Score:3, Funny)

Happens all the time... (Score:5, Interesting)

Re:Happens all the time... (Score:5, Informative)

Re:Happens all the time... (Score:2)

Re:Happens all the time... (Score:3, Funny)

Re:Happens all the time... (Score:5, Interesting)

Re:Happens all the time... (Score:3, Insightful)

Re:Happens all the time... (Score:5, Funny)

Re:Happens all the time... (Score:3, Insightful)

Re:Happens all the time... (Score:3, Interesting)

Official my arse... (Score:4, Insightful)

Re:Official my arse... (Score:2, Informative)

MOD PARENT UP (Score:2)

System Tracked Crew Location, Not Reservations (Score:5, Informative)

Re:System Tracked Crew Location, Not Reservations (Score:2, Funny)

Re:System Tracked Crew Location, Not Reservations (Score:5, Funny)

Re:System Tracked Crew Location, Not Reservations (Score:4, Funny)

Re:System Tracked Crew Location, Not Reservations (Score:3, Interesting)

Re:System Tracked Crew Location, Not Reservations (Score:3, Insightful)

Someone's gotta say it... (Score:3, Insightful)

Re: (Score:3, Interesting)

Re:Someone's gotta say it... (Score:2)

Re:Someone's gotta say it... (Score:2)

Re:Someone's gotta say it... (Score:4, Insightful)

Re:Someone's gotta say it... (Score:5, Insightful)

Bringing the /. effect to the weary masses. (Score:2, Funny)

My theory? (Score:5, Funny)

Re:My theory? (Score:2)

Re:My theory? (Score:4, Funny)

Re:My theory? (Score:5, Funny)

Re:My theory? (Score:5, Funny)

stating the obvious (Score:5, Insightful)

It's obvouis... (Score:3, Funny)

Re:It's obvouis... (Score:2)

It was running on SCO Unix... (Score:5, Funny)

blaming the system can backfire (Score:5, Insightful)

Scalability and Twelve Step TrustABLE IT (Score:3, Interesting)

Re:Scalability and Twelve Step TrustABLE IT (Score:5, Insightful)

Where did the system fail under stress? (Score:2)

Re:Where did the system fail under stress? (Score:2)

Leasing third party servers for stress testing (Score:2)

Re:Leasing third party servers for stress testing (Score:2)

Re:Scalability and Twelve Step TrustABLE IT (Score:2)

Re:Scalability and Twelve Step TrustABLE IT (Score:2)

Scalability on demand and third party servers (Score:2)

This is getting a little to common for them. (Score:4, Interesting)

Re:This is getting a little to common for them. (Score:2)

From old information... (Score:5, Informative)

Re:From old information... (Score:2)

But management saved 13.7% by hiring H1-Visas (Score:2, Interesting)

Re:It's a tragedy when... (Score:2)

Crew assigment is a hard problem (Score:5, Informative)

Re:Crew assigment is a hard problem (Score:2)

Re:Crew assigment is a hard problem (Score:2)

Re:Crew assigment is a hard problem (Score:3)

Re:Crew assigment is a hard problem (Score:5, Interesting)

Can anyone say... (Score:3, Funny)

30,000? (Score:5, Funny)

Re:30,000? (Score:5, Funny)

I don't know about their internal system... (Score:3, Interesting)

Re:I don't know about their internal system... (Score:2)

whole story? (Score:5, Informative)

Re:whole story? (Score:5, Interesting)

Re:whole story? (Score:3, Informative)

Re:whole story? (Score:3, Informative)

Not surprising, coming from Comair (Score:5, Interesting)

I'm surprised (Score:2, Interesting)