Airbus A350 Software Bug Forces Airlines To Turn Planes Off and On Every 149 Hours (theregister.co.uk) 131

Posted by BeauHD on Thursday July 25, 2019 @06:10PM from the concerning-software-bugs dept.

An anonymous reader quotes a report from The Register: Some models of Airbus A350 airliners still need to be hard rebooted after exactly 149 hours, despite warnings from the EU Aviation Safety Agency (EASA) first issued two years ago. In a mandatory airworthiness directive (AD) reissued earlier this week, EASA urged operators to turn their A350s off and on again to prevent "partial or total loss of some avionics systems or functions." The revised AD, effective from tomorrow (26 July), exempts only those new A350-941s which have had modified software pre-loaded on the production line. For all other A350-941s, operators need to completely power the airliner down before it reaches 149 hours of continuous power-on time.

Concerningly, the original 2017 AD was brought about by "in-service events where a loss of communication occurred between some avionics systems and avionics network" (sic). The impact of the failures ranged from "redundancy loss" to "complete loss on a specific function hosted on common remote data concentrator and core processing input/output modules." In layman's English, this means that prior to 2017, at least some A350s flying passengers were suffering unexplained failures of potentially flight-critical digital systems.

Airbus A350 Software Bug Forces Airlines To Turn Planes Off and On Every 149 Hours

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 131 Comments Log In/Create an Account

Comments Filter:

Huh.... (Score:5, Funny)

by Rick Zeman ( 15628 ) writes: on Thursday July 25, 2019 @06:12PM (#58987216)

Who knew Airbus was running Windows 95?

- Re:Huh.... (Score:5, Funny)
  
  by rsilvergun ( 571051 ) writes: on Thursday July 25, 2019 @06:27PM (#58987338)
  
  You have a Windows 95 install with 149 hours of uptime? How?
  
  - Re:Huh.... (Score:5, Funny)
    
    by Anonymous Coward writes: on Thursday July 25, 2019 @06:47PM (#58987464)
    
    I left it sitting at the blue screen for 148 hours.
    
    - Re:Huh.... (Score:5, Funny)
      
      by rsilvergun ( 571051 ) writes: on Thursday July 25, 2019 @07:43PM (#58987754)
      
      Well played.
      
  - Re:Huh.... (Score:5, Insightful)
    
    by arglebargle_xiv ( 2212710 ) writes: on Friday July 26, 2019 @12:15AM (#58988940)
    
    Rebooting, under the name "rejuvenation", is actually a standard technique for maintaining the integrity of high-assurance systems, periodically resetting them to a known-good state. It's commonly used in RTOSes where the code executes out of ROM and IPL is virtually instantaneous, but I could see them doing it in aircraft as well. So from a generic-computer point of view, "needs to be rebooted after X hours" sounds bad, from a high-assurance system point of view "X hour rejuvenation interval" is a standard, and expected, procedure.
    
    - Re: (Score:2)
      
      by i-neo ( 176120 ) writes:
      
      True, it's also something commonly used in all highly resilient platform.
      You design and develop the best you can, and despite the best quality tests, you will always have some issue, so you integrate failure in your design, ending up with a way to return to a known state after a number of transactions (reboot in that case). And after years in the software industry, that's surely the only way to get the results you expect. And if you design for this reboot, all is fine, it's not even visible to the end user.
    - Re:Huh.... (Score:5, Insightful)
      
      by strikethree ( 811449 ) writes: on Friday July 26, 2019 @10:21AM (#58990826) Journal
      
      Rebooting, under the name "rejuvenation", is actually a standard technique for maintaining the integrity of high-assurance systems
      Using a "reboot" to ensure things are as you think are is one thing. Being forced to reboot because your shit will break if you don't is another. They are NOT equivalent. One is nice, but not necessary while the other is absolutely necessary or it will be guaranteed to fail.
      Totally different and nowhere NEAR the same thing.
      
      - Re: (Score:2)
        
        by arglebargle_xiv ( 2212710 ) writes:
        
        Actually, they are. Let's say you have a system where an integrated component degrades by 10% every 24 hours, or at least a fixed 8-12% degradation in a 24-hour time period (lots of evaluation and modelling elided). This means you can run it for about a week before you experience a fault. Your fault isolation and recovery process then is to specify a rejuvenation interval of, say, 48 hours to deal with it and you're fine. The risk is mitigated and you move on.
        Rebooting because $warm_fuzzies is voodoo. K
- Re: (Score:2, Offtopic)
  
  by rudy_wayne ( 414635 ) writes:
  
  149 hours is 536,400 seconds. That exceeds the 512k of RAM.
  - Re:Huh.... (Score:5, Informative)
    
    by Joce640k ( 829181 ) writes: on Friday July 26, 2019 @08:41AM (#58990162) Homepage
    
    FWIW: 149 hours is an unsigned 32 bit number counting at 8kHz
    
- it IS windows 95... (Score:1)
  
  by ihaveamo ( 989662 ) writes:
  
  149 is 95 in hex!
- Re:Huh.... (Score:4, Informative)
  
  by That YouTube Guy ( 5905468 ) writes: on Thursday July 25, 2019 @07:00PM (#58987522)
  
  The original Windows 95/98 bug was to crash every 49.7 days [cnet.com] (2^32 milliseconds) of continuous operation. The moral of the crash is never use a desktop OS for a server.
  
  - Re: (Score:3)
    
    by geekymachoman ( 1261484 ) writes:
    
    > The original Windows 95/98 bug was to crash every 49.7 days [cnet.com] (2^32 milliseconds) of continuous operation. The moral of the crash is never use a desktop OS for a server.
    
    Did anybody ever trigger that bug ? My Windows 95/98 was crashing 4 times a day, never mind having it on for 49 days.
    - Re: (Score:2)
      
      by That YouTube Guy ( 5905468 ) writes:
      
      Did anybody ever trigger that bug ? My Windows 95/98 was crashing 4 times a day, never mind having it on for 49 days.
      I ran in into it all the time at work since we had ad hoc servers running Windows 95/98.
    - Re: (Score:2)
      
      by hawk ( 1151 ) writes:
      
      >Did anybody ever trigger that bug ?
      Not during testing before it was released . .. .
      Even once out, it took a while before anyone put together that hard upper limit of 50 days for systems that *did* stay up . . .
      hawk
    - Re: (Score:2)
      
      by toddestan ( 632714 ) writes:
      
      I managed to see it in action, on a PC that did nothing but run a scanner that was only used only sporadically. When I figured out that it was close to the 49.7 mark, I made sure I was there to watch it go down in flames. I was expecting to see it bluescreen at the right moment, but alas nothing happened! I then tried the mouse, and as soon a I clicked on something it went BSOD.
      I've also managed to get a Windows Vista system all the way to the 497 day bug that doesn't actually crash the computer, but end
  - Re: (Score:2)
    
    by Solandri ( 704621 ) writes:
    
    Virtually nobody ran into that bug because Windows 95/98 were built on top of DOS. They used cooperative multitasking. The OS would hand control of the CPU over entirely to a running program, and it was up to the program to return control to the OS after using the CPU for a few milliseconds. If the program didn't hand control back or crashed, the OS froze.
    
    I think the longest uptime I ever got from 95/98 was a little over 3 days. I'd turn off the computer at night because if you left it on, it was a 5
    - Re: (Score:2)
      
      by jrumney ( 197329 ) writes:
      
      Windows 95 did not use co-operative multitasking. That was Mac OS 9 and earlier, or Windows 3.1.
    - Re: (Score:2)
      
      by That YouTube Guy ( 5905468 ) writes:
      
      Windows 3.1 ran on top of DOS. For shakes and giggles, I even had Windows 3.1 running on top of DR DOS. Windows 9X had DOS built into the OS. These days I use PowerShell for my CLI stuff.
  - - Re: (Score:2)
      
      by That YouTube Guy ( 5905468 ) writes:
      
      That could explain why Microsoft publishes Power Toys [wikipedia.org] for Windows over the years.
LOL (Score:1)

by Narcocide ( 102829 ) writes:

Still running Win95 I see. What a joke.
- Re:Who knew? (Score:4, Funny)
  
  by PPH ( 736903 ) writes: on Thursday July 25, 2019 @07:47PM (#58987786)
  
  If it was running Linux, the pilots would all be bragging about their uptimes.
  
  - Re: (Score:2)
    
    by Highdude702 ( 4456913 ) writes:
    
    While writing a bash script to fly to the destination.
    - ....more geek stuff (Score:2)
      
      by DrYak ( 748999 ) writes:
      
      And writing self-modifying Perl one-liner to handle the speaker announcement.
      And complaining about their meals not being perfectly cooked and suspecting this being due to the micro-wave running a systemd-based linux distro.
      - Re: (Score:2)
        
        by Highdude702 ( 4456913 ) writes:
        
        Damn nerds.
  - Re: (Score:2)
    
    by Agripa ( 139780 ) writes:
    
    If it was running Linux, the pilots would all be bragging about their uptimes.
    737 Max pilots would not be.
Huh (Score:2, Insightful)

by 93 Escort Wagon ( 326346 ) writes:

Maybe they should ground them all until they actually fix the problem.
What is with these airplane manufacturers and their seemingly blasé attitude towards flight safety?
- Re: (Score:1)
  
  by SirAstral ( 1349985 ) writes:
  
  Did you expect regulations and government agencies to protect you?
  - - Re: (Score:1)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:1)
        
        by Anonymous Coward writes:
        
        No plane flight lasts 8 days. Just reboot before takeoff. Add it to the checklist.
        
        Re: (Score:2)
        
        by Kernel Kurtz ( 182424 ) writes:
        
        No plane flight lasts 8 days. Just reboot before takeoff. Add it to the checklist.
        Setting uptime records doesn't seem to be something really positive in this environment anyway. I don't want my plane to be spending any nines in the air anyway.
        
        Re: (Score:1)
        
        by Anonymous Coward writes:
        
        What have jumbo jets got to do with this? Please, stay out of conversations you don't understand.
    - Re: (Score:1)
      
      by SirAstral ( 1349985 ) writes:
      
      I am sure all the families with loved ones buried by the airlines are cheering you on!
      If the bug was recent, perhaps you would have a leg to stand on, but according to the article bugs like these have been for multiple years.
      There is a difference between protecting someone and warning them to not walk over a cliff. Protection takes action and enforcement, not words, bulletins, warnings, and alerts.
      Even the Max had to crash twice before they did anything. I would say your faith in government is misplaced.
  - Re: (Score:2)
    
    by skam240 ( 789197 ) writes:
    
    Infinitely more so than the companies themselves.
Just a hair before 2^29 milliseconds (Score:1)

by Anonymous Coward writes:

2^29 milliseconds is 149 hours and 470.912 seconds. Perhaps an overflow?
- Re: (Score:2)
  
  by Anonymous Coward writes:
  
  Almost certainly. Chances are device 1 is sending data to device 2; and each packet has a timestamp that must be strictly increasing. Device 1 generates that timestamp by using a counter that overflows, so it starts sending out packet with timestamps around 0, 1, 2, ... etc again. Device 2 then says, "hey, I've already received packet 4 billion, so I should ignore packet 1" and suddenly device 1 is being ignored by the network.
  Ways around include: Increasing the size of the timestamp (2^64 milliseconds i
  - Re: Just a hair before 2^29 milliseconds (Score:2)
    
    by OneSmartFellow ( 716217 ) writes:
    
    No it isn't. The correct approach is to have a protocol with multiple message types, one of which would be a message indicating a serial number reset. That's absolutely standard practice in high volume message systems. So standard that it's unlikely that message out of sequence is even related to the problem.
- Re: Still not as bad (Score:1)
  
  by Anonymous Coward writes:
  
  Well, the 787 Dreamliner, a completely new plane, not the death trap of reusing the old airframe and sticking new necells like 737 Max 8 is, had a similar problem.
  It had to be turned off and back on again every 248 days.
  So it seems FAA is not doing a good job in certifying the planes....
  https://it.slashdot.org/story/15/05/02/1240222/long-uptime-makes-boeing-787-lose-electrical-power
IT Crowd (Score:5, Funny)

by Ukab the Great ( 87152 ) writes: on Thursday July 25, 2019 @06:34PM (#58987396)

Have you tried turning your plane off and on again?

- Re:IT Crowd (Score:5, Interesting)
  
  by Strider- ( 39683 ) writes: on Thursday July 25, 2019 @08:08PM (#58987882)
  
  I was once a pax on a CRJ (aka barbie-jet) and as we were preparing to head onto the runway, the pilot comes onto the intercom and goes "so folks, we're getting something unexpected from our flight computers. We're going to reboot the jet and see what happens." so they power cycled the jet, and off we went. Didn't give a whole lot of confidence.
  
  - Re: (Score:3)
    
    by whoever57 ( 658626 ) writes:
    
    We're going to reboot the jet and see what happens." so they power cycled the jet, and off we went.
    
    Hah! Tesla owners reboot their cars while driving! Press and hold both steering wheel "nipples" and wait for the screen to turn off.
    - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      I have one (Model 3), and have done this (reboot while driving), it's freaky when the screen goes black and all the sounds stop.
      BUT, it's only the entertainment and instruments that go dark. All the driving functions still operate normally; brakes, steering, headlights, brake lights, turn signals, etc.
      If you want to do a full power off reset (yes, it's a thing) you have to be parked.
- Re: (Score:2)
  
  by brokenin2 ( 103006 ) writes:
  
  ...oh, hi Mom... uh.. uh hu.... yeah.. have you tried turning it on and off again?
Needs more (Score:2)

by AHuxley ( 892839 ) writes:

Ada code.
- Re:Needs less (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  code. FTFY
That's Comforting (Score:2)

by Arzaboa ( 2804779 ) writes:

Nice. Now I have to also try not to worry about if they rebooted the plane in the last 149 hours the next time I fly.
--
I believe that tomorrow is another day and I believe in miracles. -- Audrey Hepburn
Well, it's a plane (Score:5, Funny)

by istartedi ( 132515 ) writes: on Thursday July 25, 2019 @07:11PM (#58987594) Journal

You don't really want 100% uptime anyway.

- Re: (Score:3)
  
  by Livius ( 318358 ) writes:
  
  You don't really want 100% uptime anyway.
  Up time... I see what you did.
- Re: (Score:2)
  
  by mobby_6kl ( 668092 ) writes:
  
  On the other hand, you don't want rapid unplanned downtime either...
It's TIME FOR MORE QA with aircraft software (Score:2)

by Joe_Dragon ( 2206452 ) writes:

It's TIME FOR MORE QA with aircraft software.
Maybe even some interdependent ones mandated by the faa.
and how bad will the 2038 bug be? (Score:2, Interesting)

by Joe_Dragon ( 2206452 ) writes:

and how bad will the 2038 bug be?
- Re: (Score:2)
  
  by grep -v '.*' * ( 780312 ) writes:
  
  and how bad will the 2038 bug be?
  **WILL**?? What makes you think this isn't an early indication of it?!?
2^29 miliseconds? Boeing still in the 32 bit era? (Score:2)

by thesjaakspoiler ( 4782965 ) writes:

While the rest of us is already in the 64 bit era?
- Re:2^29 miliseconds? Boeing still in the 32 bit er (Score:5, Informative)
  
  by lordlod ( 458156 ) writes: on Thursday July 25, 2019 @09:15PM (#58988220)
  
  1. The plane in question is manufactured by Airbus not Boeing.
  2. Of course they are using 32 bit. It is an embedded system, most processors are 32 bit ARMs or 8 bit chips.
  3. The plane in question started manufacture in 2010 (Wikipedia), the subsystem design would have preceded this by years. Arm didn't release their 64bit architecture until 2011.
  4. A 32bit count of milliseconds corresponds to 49 days, a long way from 149 hours. It does correspond with a 32bit counter and an 8kHz clock though.
  
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
Free Solution: Run this code (Score:4, Funny)

by divide overflow ( 599608 ) writes: on Thursday July 25, 2019 @10:15PM (#58988454)

My solution:

#!/bin/bash line="echo * */149 * * * /sbin/reboot" (crontab -u root -l; echo "$line" ) | crontab -u root -

- Re: (Score:1, Insightful)
  
  by Anonymous Coward writes:
  
  My solution:
  #!/bin/bash
  line="echo * */149 * * * /sbin/reboot"
  (crontab -u root -l; echo "$line" ) | crontab -u root -
  That will never be approved by FAA. The software in a plane has to be reliable, which is why it has to meet some rather strict requirements. A plane is only allowed to start programs and allocate or free memory while on the ground. We can't have a plane crash because malloc failed. You use "dangerous" commands like echo, which is actually not allowed.
  If you are puzzled to why you can't use echo on a plane due to safety, but you can fly with a plane, which crashes after 149 hours, then that's the airline ind
  - Re: (Score:3)
    
    by divide overflow ( 599608 ) writes:
    
    That will never be approved by FAA. The software in a plane has to be reliable, which is why it has to meet some rather strict requirements. A plane is only allowed to start programs and allocate or free memory while on the ground. We can't have a plane crash because malloc failed. You use "dangerous" commands like echo, which is actually not allowed.
    Could it be more obvious that I was joking??
    
    The solution is obvious: Fix the goddamned code.
  - Re: (Score:1)
    
    by BlackOverflow ( 5394496 ) writes:
    
    Whooooooooooooosh!
Before/After (Score:2)

by BoogieChile ( 517082 ) writes:

> Rebooted after exactly 149 hours
> ...rebooted before it reaches 149 hours
I really, really wish people would stop doing that.
goes to show (Score:2)

by sad_ ( 7868 ) writes:

this just goes to show that no matter where software is being used, it's all rubbish.
it's funny because i have written a lot of programs/tools and ofcourse it also contains bugs, mostly corner cases, but that's no excuse because when you hit such a case you've got problems.
for the few times this happens, my boss pulls out this speech about how much better software is developed for cars & airplanes and that we should try to mimic the same quality, etc.
as far as i can see, the quality isn't much better th
Article ommits key facts - FUD (Score:3, Interesting)

by EmTeedee ( 948267 ) writes: on Friday July 26, 2019 @08:15AM (#58990068) Journal

The orignal Airworthiness Directive (linked in the article) reference a Service Bulletin which defines the necessary updates.
Basically there is a patch available since 14 August 2018. The Directive does no longer apply as soon as an airline installs those patches. Seems like Boeing is trying to spread FUD...
Quotes from the Directive:
Modification of an aeroplane in accordance with the instructions of the SB constitutes terminating action for the repetitive ground power cycles (resets) as required by paragraph (1) of this AD for that aeroplane.
Airbus SB A350-42-P010 original issue dated 14 August 2018.

- Re: (Score:2)
  
  by johnsie ( 1158363 ) writes:
  
  Yeah, it's fake news and only applies to unpatched planes.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Huh.... (Score:5, Funny)

Re:Huh.... (Score:5, Funny)

Re:Huh.... (Score:5, Funny)

Re:Huh.... (Score:5, Funny)

Re:Huh.... (Score:5, Insightful)

Re: (Score:2)

Re:Huh.... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2, Offtopic)

Re:Huh.... (Score:5, Informative)

it IS windows 95... (Score:1)

Re:Huh.... (Score:4, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

LOL (Score:1)

Re:Who knew? (Score:4, Funny)

Re: (Score:2)

....more geek stuff (Score:2)

Re: (Score:2)

Re: (Score:2)

Huh (Score:2, Insightful)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Just a hair before 2^29 milliseconds (Score:1)

Re: (Score:2)

Re: Just a hair before 2^29 milliseconds (Score:2)

Re: Still not as bad (Score:1)

IT Crowd (Score:5, Funny)

Re:IT Crowd (Score:5, Interesting)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Needs more (Score:2)

Re:Needs less (Score:2)

That's Comforting (Score:2)

Well, it's a plane (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

It's TIME FOR MORE QA with aircraft software (Score:2)

and how bad will the 2038 bug be? (Score:2, Interesting)

Re: (Score:2)

2^29 miliseconds? Boeing still in the 32 bit era? (Score:2)

Re:2^29 miliseconds? Boeing still in the 32 bit er (Score:5, Informative)

Re: (Score:2)

Free Solution: Run this code (Score:4, Funny)

Re: (Score:1, Insightful)

Re: (Score:3)

Re: (Score:1)

Before/After (Score:2)

goes to show (Score:2)

Article ommits key facts - FUD (Score:3, Interesting)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals