Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software IT

Southwest Meltdown Shows Airlines Need Tighter Software Integration (wsj.com) 59

The Southwest Airlines meltdown that stranded thousands of passengers during one of the busiest travel weeks of the year exposed a major industry shortcoming: crew-scheduling technology that was largely built for a bygone era and is due for a major overhaul. From a report: Southwest relies on crew-assignment software called SkySolver, an off-the-shelf application that it has customized and updated, but is nearing the end of its life, according to the airline. The program was developed decades ago and is now owned by General Electric. During the winter storm, amid a huge volume of changes to crew schedules to work through, SkySolver couldn't handle the task of matching crew members and which flights they should work, executives of the Dallas-based carrier said.

Southwest's software wasn't designed to solve problems of that scale, Chief Operating Officer Andrew Watterson said Thursday, forcing the airline to revert to manual scheduling. Unlike some large rivals with hub-and-spoke networks, Southwest planes hopscotch from city to city, which may have been another complicating factor. Many carriers still rely on homegrown solutions, which largely were built on legacy mainframe computers, analysts say. Analysts and industry insiders say the airline industry is overdue for a massive technology overhaul that would take advantage of highly scalable cloud technologies and fully connect disparate sources of real-time data to better coordinate crews with aircraft. The airline sector has been among the slowest to adopt cloud-based and analytics technologies that could help solve complicated transportation network problems, those analysts say.

This discussion has been archived. No new comments can be posted.

Southwest Meltdown Shows Airlines Need Tighter Software Integration

Comments Filter:
  • by funkman ( 13736 ) on Tuesday January 03, 2023 @11:53AM (#63176516)

    If a "rewrite" of software of this complexity is done - they better hire 2 journalist to document the entire process as it spectacularly fails in time and cost overruns.

  • by DarkOx ( 621550 ) on Tuesday January 03, 2023 @11:57AM (#63176540) Journal

    I feel like this "it was an 'old' software problem." Is likely an excuse.

    Resource scheduling/optimization is a NP hard problem but narrowed implementations have been basically the bread and butter the computerized logistics applications since practically dawn of commercial electronic computing machinery. Shipping lanes, capacities, personnel etc..

    The airline problem (more rules than trucking/container ships/air cargo/etc), especially SWA's non-hub-spoke model is probably one of the more complex ones to deal with, but I find it really difficult to imagine everything it *needed* to do would not be already implemented. Its not like weather has not canceled flights before. Its also not as if even if the software was slow... that should matter at this point given the number of planes, crews, passengers, ground crews, etc involved its not a lot of data in 202[23], hardware has gotten pretty fast, assuming it was at least migrated to contemporary-ish machines even old software with very niavie implementations of algorithms etc, should have been able to be pushed around pretty quickly. The only issue I can see is maybe someone implemented some of those old systems with stupid-low maximums like can only handle N active flight crews or something... where N became to small.

    Otherwise it seems very much more likely the problem was probably more like, they did not have enough staff, or staff trained appropriately to use the software, or people were not doing the data entry needed in a timely manor or something... Its just really hard for me to imagine scheduling software that has been running the airlines day to day operations for a decade and survived other unusual events suddenly broke down because of to many schedule updates. Now maybe its not as good a tool as some of the other carriers have got, but this reeks of people/process problem more so than a 'stuffs old problem' from an under informed armchair perspective.

    • by Anonymous Coward on Tuesday January 03, 2023 @12:06PM (#63176586)
      Someone I know who works in the industry and is very familiar with scheduling software tells me Southwest has separate flight scheduling and crew scheduling software and the two don't "talk" to each other. So, when a flight is cancelled they have to make manual changes to the crew scheduling software. Everything just snowballs because of this.
      • Even worse, often they were unable to adjust on the fly. For example, if they had the two pilots and three flight attendants scheduled for a flight, but two of the flight attendants couldn't make the flight, it didn't matter that five other flight attendants who were deadheading to get to another airport so they could staff their flight, the flight got canceled because no one could update the crew scheduling system fast enough to pull some of the deadheads into the working crew. It's really quite mindboggli
        • Some of this (from what I understand at least) is due to union work rules; dead-heading crew cannot just be called up to work, the reserves get called first because you know they are legal to fly.

          Some airlines have a little bit of dynamic flexibility where a dead-heading crew member can cover a spot, but they are not considered full crew. If you need four flight attendants to be legal on the flight then someone deadheading (and legal) can sit in a jump seat and be counted as required crew, but they don't ha

    • people were not doing the data entry needed in a timely manor or something...

      From what I have heard that is closes to the truth, but there is a tipping point where the number of cancellations and corresponding manual rescheduling of crew (data entry) reached a phase change from a local disruption, to a task that couldn't be done in time to save the next scheduled flight... and so on... a cascading failure.

      As for upgrading, NP hard blah blah has so little to do with it. It's about integrating all the bus

      • Re: (Score:3, Interesting)

        by Anonymous Coward

        NP Hard problems assume "Large Datasets" - this is *NOT* a large dataset by any stretch of the imagination in the relation to the amount of compute resources needed. In checking w/ Wikipedia - they have 779 aircraft and service 121 destination. If each aircraft requires six staff members (pilot, co-pilot, four flight attendants (*1, *1, *4!), you end up with a measly 2,262,216 combinations.

        It's not the underlying problem that's "hard" - it's the combination of their business model (point-to-point rather t

        • ^ +1

          The complexity of the IT problem isn't the issue. I imagine on Sunday someone realized that they are going to be cripled until Friday and that there was no way around it. So, they prepped to make sure they would be smooth, and took the hit until then.

          The IT frustration was mainly the fact that employees didn't know what the heck they should do until Tuesday because the airline didn't have a system that could manage things like the hotels that needed to be arranged for stranded crew, or that they were

      • by mspohr ( 589790 )

        In addition, Southwest didn't have a system for crew to check in and report their position and availability for work. Crew had to call (on a telephone) and wait on hold for hours to tell Southwest that they were available to fly.

    • by Ichijo ( 607641 )

      Software is "old" when investing in newer software will save the company money in the long run.

      I use the same reasoning to help me decide when to buy a new car.

    • My guess would be that when the software was written, they figured there will never be more than a thousand changes at once, so "we can get by" with this quick and dirty suboptimal algorithm.

      • And today with a processor 10x or 20x as fast, and RAM cheap enough that the entire working set is RAM and most of the backing files are cached, they've been getting by just fine most of the time. The single biggest issue was probably the lack of connection between plane and crew scheduling; if that's a manual step then huge time is lost. (Similarly for other comments re: crew having to call in rather than just text or being online.)
    • It is a couple things happening at once to an airline that has "inherently simple" operating mechanics. The leadup to the meltdown, along with the holidays is what broke them; the meltdown was the result of not having the propoer procedures and systems in place to recover from an issue.

      Traditionally, a weather pattern has very limited impact on Southwest, specifically because they don't have any hubs that exponentially impact operations. The holidays, aside from very high load factors, were also a challen

    • As a friend pointed out, the technology available in 1990 was perfectly adequate to handle Southwest's scheduling problems today. There's not much excuse to point that things were simpler in the past and Southwest only had a few flights. There's a lack of scale, true, but designers should always keep that in mind. More likely problems are the lack of maintenance towards making sure all the company's software systems integrate with each other properly, instead a series of bandages over time as requirement

      • "only a few flights" - and today a processor is 10x or 50x as fast and the entire working data set is in RAM (or should be). Scale can be worked around with money.
    • I feel like this "it was an 'old' software problem." Is likely an excuse.....

      The real story was SouthWest supposedly had over 150 ramp workers walk off the job in Denver, following threats from management that they would summarily fire any workers who were absent without a doctors note and the record freezing temperatures in Denver. Some of the ramp workers were working over 18-hour shifts outdoors in a snow storm, and many of the roads around Denver were closed in the snow storms, making it difficult to get to and from work. This caused flights and baggage to pile up at Denver. The

    • My understanding is that the software was 1970s vintage, and rather than invest in updating it, management kept shoveling money to investors and into their own pockets. It is a tale as old as time ⦠cue musical number
    • From what I understand, I don't think it was fully on the scheduling software... a big part of it had to do with the fact that other systems hadn't been modernized. While other airlines have apps that allow crews to check in and notify HQ their whereabouts, Southwest requires crews to call a person at HQ if they need to provide updates.

      So essentially, their systems just assume crews and pilots are following their schedule for the day, meaning they are at the right place and time to fly the next aircraft. In

    • Very insightful, but everyone is ignoring the REAL story. Most Southwest employees (like everyone else) like to spent their nights with their families. A long-standing Southwest policy allows employees to ride free after their shifts to their home locations. They can also ride free from their homes to their starting flight--policy requires that they don't take the last flight; there must be a backup flight scheduled after the one they pick. Needless to say, this means they arrive early at their first paid f
  • Shows need for competemt editors who are actually proud of their work.
  • by Anonymous Coward

    Better call the COBOL programmers out of retirement (again). I think they'll still be working after the Java and Python kiddies are obsolete.

  • by magzteel ( 5013587 ) on Tuesday January 03, 2023 @12:07PM (#63176588)

    Free link to the article: https://www.wsj.com/articles/s... [wsj.com]

  • by KingFatty ( 770719 ) on Tuesday January 03, 2023 @12:10PM (#63176594)
    Every other carrier besides Southwest was able to handle the storms. This is just a Southwest problem. They cheaped out and knew they needed to fix the problem, months before the meltdown happened. It was just a matter of time, only for Southwest.
  • Bitrot is a bitch in old software

  • It shows that taking "advice" from the putting green isn't such a great way to run a company. Gonna need nuts and bolts people for this. Does the leadership at Southwest have it in them to stomach what needs to be done? Or will a fancy dance occur that results in a half-ass good-enough-for-next-couple-Qs-or-acquistion and some random payouts to shareholders? Yeah, probably that. Then someones, somewhere will start a new airline and perhaps it will desire to fly more than the people at this one do.
    • by mspohr ( 589790 )

      "Leadership" at Southwest paid themselves generous bonuses instead of investing in, like, software and people to run the business.

    • Leadership at SW went from people who came up in the industry, to interchangeable money men. For another example see Boeing, and airliners flying into the ground.
  • Logistics software (Score:5, Insightful)

    by Petersko ( 564140 ) on Tuesday January 03, 2023 @12:24PM (#63176652)

    I spent over twenty years in scaling and maintaining "legacy" logistics software. In my case it was for entities of sufficient size that no off-the-shelf solutions were available. In that time I watched the legacy products march on, while projects as expensive as $200+ million failed to replace them.

    My prediction is that Southwest will be pushed into partnering with a major provider because optically that's the only answer acceptable to politicians. And three years from now the original systems will still be in place, the projects will fail, and the company will be restating massive amounts of capital money to operating expenses. The vendor (or vendors) will be pointing fingers, and the original applications will be "unfrozen" to try to patch the gaps that have occurred since project inception.

    And Oracle/Microsoft/Infosys/whomever will be laughing all the way to the proverbial bank.

    • You forgot ibm

    • When a $200 million software project fails, it's nearly always due to inept management. I've been part of big projects that failed, and big projects that succeeded. The only difference was the quality and intelligence of those managing the project, and the amount of latitude given to them.

      • Absolutely. But also because logistics is HARD. It's one of the most academically studied problem spaces for a reason. Couple that with the fact that whomever makes the sale will do so on some version of "AI / machine learning will save the day for you!" and you vastly increase the cost, and decrease the likelihood of success. They can fail on multiple fronts at once this way.

        • Oh for sure. I've worked in genetics, mortgage, healthcare, all of which have difficult problems to solve. I've yet to see a business problem that couldn't be solved _for technical reasons_.

  • by Behrooz ( 302401 ) on Tuesday January 03, 2023 @12:33PM (#63176712)

    Southwest USED to be a well-run corporation, with stable, steady growth and providing a reliable product, with employees that were generally treated decently and provided with the resources they need to do their jobs. Then Gary Kelly took over as chairman and president in 2008 from the original founder, and decided to play accounting games instead. In the last decade, Southwest repurchased more than eight billion dollars of stock, out of about nineteen billion dollars pre-tax operating income. Kelly bailed out last year with the ~hundred million dollars he personally extracted, and left behind... this.

    The previous generation of self-dealing executives made ridiculous sums of money bleeding the company dry to juice their stock options. Corporate stock buybacks should never have been legalized -- and as long as there's cheap fast money in it, grifters and private equity vultures will be happy to suck the money out of anything that moves and leave failing companies and massive layoffs behind.

    • Then Gary Kelly took over

      Gary was the CFO, not from an operations background. Therein lies the issue but SWA is a financially stable airline and this incident is really the responsibility of the current CEO and the board. Share buybacks etc. have to be approved by the board so they acted as how businesses do meanwhile unlike other airlines
      they invested in new aircraft and were also bit by the 737 max issues, canceling 10s of thousands of flights because of that grounding alone.

      I'm not defending the board decision but if SWA was clo

      • by Behrooz ( 302401 )

        Sure, SWA could be argued to be still a financially stable airline. But how long does that hold when there are now hundreds of thousands of former customers who will never consider flying SWA again? Or worse, if the scheduling system falls apart again in a month or two before the root causes can be addressed?

        All in all, a particularly egregious example of entirely legal self-dealing, extracting every possible dollar of the value created by others with complete confidence that someone else will end up payi

        • hundreds of thousands of former customers who will never consider flying SWA again?

          you have a fare sale; it's been no secret that passengers pay for the lowest seat price first and everything else second. Otherwise, why would Spirit have even existed? Their PR is damaged but it's not an insurmountable problem and it'll cost them some money, sure. I'm sure the shareholders will let them know who displeased they are at the next shareholder call.

          ears ago American had to ground their whole fleet of MD-83s because of a wiring issue that wasn't an issue but still mandated as "urgent" by the FAA

  • A great narrative to hide their actual problems of mismanagement.

    This has little to do with tech and almost everything to do with how Southwest books ald routes flights.

    They're trying to play a game where they offer some nice features few other airlines offer (more direct routes and free ticket rescheduling), but then they don't spend what they need to actually make that work.

    Other airlines aren't routing flights through hubs to annoy their customers. They do it because it helps you quickly address interrup

    • There's no indication that the point-to-point scheme Southwest uses, has anything to do with the scheduling problems. In reality, Southwest does have a hub-and-spoke system. I live in Houston, where it has one such hub (Houston Hobby); Dallas (its home base) has another. Wither hub-and-spoke, or point-to-point, the technical challenge is the same. People in one place have to be routed to another, one hop at a time. Southwest's software couldn't do this.

    • you do realize they carry more US passengers than any other airline and have about 30% more flights too? They must be doing something right.

      • You do realize the week's airline meltdown was largely down to Southwest being unable to meet their obligations?

        They're obviously doing something very wrong.

        • I like the entitled term "obligations" you have no idea of what you're talking about. From the DOT flyer's rights website. [transportation.gov]

          Airlines don't guarantee their schedules, and you should realize this when planning your trip. There are many things that can-and often do-make it impossible for flights to arrive on time. Some of these problems, like bad weather, air traffic delays, and mechanical issues, are hard to predict and often beyond the airlines' control.

          Those who have flown on "Spirit": know all about that. ... and further in the doc ...

          If your flight is canceled, most airlines will rebook you on their first flight to your destination on which space is available, at no additional charge. If this involves a significant delay, find out if another carrier has space and ask the first airline if they will endorse your ticket to the other carrier. Finding extra seats may be difficult, however, especially over holidays and other peak travel times

          They'll definitely suffer and I'm sure the CEO's credibility with the board is badly tarnished, but they'll hire some big 4 firm to come in, generate a multi-thousand-page document that says, "you fucked up, you didn't anticipate the weather conditions and their effect on operations and here's what we r

          • You don't think companies are obligated to provide the services you pay for? May I interest you in some marketing materials?

            You're corporate America's little bitch, aren't you?

            • Every ticket comes with an implied term of service, SWA publishes theirs [southwest.com] as do other airlines, and before you buy a ticket it's good to know what kind of recourse you have when you fall off the happy path. You can read, can't you?

              Significant Delays or Involuntary Cancellations. If a Passenger’s scheduled transportation
              is significantly disrupted by the Carrier before the Passenger has reached his or her final
              destination as a result of a flight cancellation, Carrier-caused missed connection,
              significant flight delay, significant schedule change, or omission of a scheduled stop
              caused by the Carrier, Carrier may do one of the following:
              (i) Transport the Passenger at no additional charge on another of Southwest Airlines
              flight(s);
              (ii) Refund the fare for the unused transportation in accordance with this Section 4; or
              (iii) Provide a Flight Credit or a Transferable Flight Credit depending on the fare
              purchased for the unused portion of the Customer’s fare in accordance with this
              Section 4. See also Section 9.a

              Reading the above, they'll honor your patronage of purchasing that $39 flight to see grandma by getting you on another flight of theirs or give you a credit or refund. They also have terms dealing with a Force Mazure event, but I won't go into those specifics.

              Here's a not so well-k

  • So the article claims they're using a customized version of this: https://www.ge.com/digital/app... [ge.com] which describes itself

    The platform consists of 4 cloud-based applications: Network Operations Insights, Network Operations Optimization, Network Crew Optimization, and Network Operations Passenger Protection. Each of these products operate on their own, when used together the maximum value is realized.

    One is left wondering if SouthWest got stuck in the past because of customization? Was it a cloud platform capacity problem or an issue with the algorithms not scaling such that no matter what your infrastructure capacity was it wouldn't solve that large a problem? Are they using some parts but not others? So many questions, so little actual information available!

  • Disclaimer: I've worked for two airlines,

    if one thing is sure, the fleet's modernization takes priority over the modernization of the logistics and decision support systems; new planes drive revenue, and new software is abstract to the C-suite. Crew scheduling in and of itself is a difficult problem during normal operations vs. interrupted operations (weather, mechanical problems) you may have flight crew with hours who can take a flight but they're not in the right place. Likewise in order to have a flight

  • the airline industry is overdue for a massive technology overhaul that would take advantage of highly scalable cloud technologies

    If the problem is badly written program, cloud may not be the solution. A weak algorithm will perform badly on a cloud infrastructure as well.

You know you've landed gear-up when it takes full power to taxi.

Working...