Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Bug Transportation

Long Uptime Makes Boeing 787 Lose Electrical Power 250

jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
This discussion has been archived. No new comments can be posted.

Long Uptime Makes Boeing 787 Lose Electrical Power

Comments Filter:
  • You should see what happens after -2147483648 days of upt-- oh wait.
  • by Anonymous Coward on Saturday May 02, 2015 @08:33AM (#49599833)

    Finally!

    IT support advice that's useful!

    • by rjniland ( 84094 ) on Saturday May 02, 2015 @01:23PM (#49601285)

      Yes, but perform a clean systems shut down BEFORE turning off power.

      I was on an airliner once that crashed at the gate, prior to departure.

      Ground power was disconnected before they had spun up the APU. Lights out. Lights on. ... Several minutes later we get an announcement that we'd have to wait for a backup plane, which took 45 minutes to arrange.

      They were unable to reboot the airliner.
      Robust systems design wasn't a phrase that came to mind.

  • by mikeabbott420 ( 744514 ) on Saturday May 02, 2015 @08:35AM (#49599841) Journal
    "have you tried turning it off and then back on?"
    • by sphealey ( 2855 )

      The first time I was on a new plane where the pilots did that at the gate to "fix a computer glitch" (~1998) I was utterly terrified.

      sPh

      • by jcdr ( 178250 )

        I remember that in 1996 the pilot of a FBW aircraft has say that one computer displayed an error, then there restarted the computer and the error was not displayed again, so all is nominal and we can go. He didn't detailed the error displayed. Maybe this was minor, maybe not. The 6 hours fly was without any problem.

    • "have you tried turning it off and then back on?"

      • Customer: How do I do that?
      • Tech Support: Use the big red switch [osv.io] at the back of the fuselage, just under the elevator. Flip it to 0/Off, count to 10 and flip it back to 1/On.

      True story: Back in the early 1980s, I actually had a long-distance phone call with someone in which I was the "tech support" part of the above conversation. ... Me: "Are you sitting in front of the PC? Lean to your right... See that big red switch at the back of the case? ..."

    • by plopez ( 54068 )

      "Thank you for calling Boeing tech support. Did you most of your questions can be answered online in our FAQ section? Simply go to www.boeingcares.com/customercare/support/FAQ. If this is an ground problem press 1, if it is a maintenance question please press 2, if this is about the galley hotbox recall press 3, for in-flight problems press 4"

      *beep*
      "You have selected in-flight problem. For engine fires press 1, for structural failure please press 2, for fuel system faults please check 3, for all other in-fl

    • by fuzzyfuzzyfungus ( 1223518 ) on Saturday May 02, 2015 @01:01PM (#49601169) Journal
      NTSB investigators reported the cause of the crash as 'Controlled reboot into terrain".
  • by Brandano ( 1192819 ) on Saturday May 02, 2015 @08:36AM (#49599845)
    A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.
    • by hawguy ( 1600213 ) on Saturday May 02, 2015 @08:44AM (#49599897)

      A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.

      It's very reassuring to know that it probably won't happen.

      • by confused one ( 671304 ) on Saturday May 02, 2015 @09:13AM (#49600029)
        If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.
        • If it ever happened on a plane, then it means that the maintenance was intentionally skipped.

          And that would of course never happen.

        • by hawguy ( 1600213 )

          If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.

          You would think that if this situation was unlikely to ever happen in practice that the FAA wouldn't have deemed it necessary to issue an AD requiring that the GCUs be power cycled at intervals no longer than 120 days. You'd think they'd already be aware of required maintenance intervals that require powercycling the GCUs, and they waived the usual comment period before issuing the AD due to the perceived imminent danger.

      • by Mirar ( 264502 )

        Waiting 248 days on the tarmac before flight... Improbable. I hope.

        • Re: (Score:2, Funny)

          by Anonymous Coward

          You must not fly United.

    • by plopez ( 54068 )

      Is that a cold boot or a warm boot?

    • That's what I was thinking. I didn't look it up, but I'd be pretty sure that the maintenance interval is shorter than 5,952 hours.

  • Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).
    • by fisted ( 2295862 )
      If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.
      • If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.

        What math would you do to determine exactly how high a counter should count?

        Would using a 64-bit long on a millisecond counter be lazy programming?

        • The correct answer is: During pre-flight ground checks, detect all counters at imminent risk of overflowing*, and flag requirement for corrective action at next maintenance. Probably should be checked at all routine services as well.

          * "imminent risk of overflowing" probably means less than four routine maintenance intervals remaining, but consult the requirements document for more detail.

          This is aerospace, not gaming.

    • by Megane ( 129182 )
      Also, use the difference of the current time minus the start time, instead of computing the end time and using a simple less than/greater than comparison. This properly handles wraparounds, and only has a problem with differences more than half of the full range. (so don't keep comparing the time after it's ended!)
      • Good idea in principle, not always helpful in practice. I diagnosed an application failure on a UNIX system some years back that resulted from using the system's "time since last boot" function as a real-time clock with greater than one-second precision. We discovered that in order to prevent the terrible things that would happen if the 32-bit signed counter of 0.01-second intervals ever overflowed, the UNIX vendor had programmed the reported time to stop changing when it reached 2^31-1. Since the system pr

    • Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).

      As far as I am concerned, there are three valid quantities in programming. Zero, one and unlimited.

      • Good luck using a float as a counter. It won't overflow, but will eventually stop counting.

        The trick is knowing what you are doing. Which means erasing that 'three valid quantities' thinking.

  • by photonic ( 584757 ) on Saturday May 02, 2015 @08:44AM (#49599895)
    I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.
    • by Megane ( 129182 )
      At least that's better than Window 98 crashing after 7 weeks! [microsoft.com] (because 1ms instead of 10ms)
      • Re: (Score:2, Funny)

        by Anonymous Coward

        I call BS. No WIndows 98 machine could possibly stay up for 7 weeks, so this was a non-issue.

    • by bosef1 ( 208943 )

      That makes a lot of sense. A lot of aviation power systems run with 400 Hz AC current (the higher frequency lets them use smaller transformers). They could be dividing down the power signal to 100 Hz, and using that to increment a counter.

      The other option is that many operating systems use 10 ms = 100 Hz for their internal interrupt timers. So it could just be a counter that is being incremented every interrupt cycle, and doesn't care what frequency of electricity is being used.
      (cf. the jiffy http://en.w [wikipedia.org]

    • by TheRealHocusLocus ( 2319802 ) on Saturday May 02, 2015 @04:57PM (#49602389)

      I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.

      Yes, the moment the big bird would shut down was correctly prognosticated by the Connecticut Yankee in King Arthur's Court [gutenberg.org]. While testing a crowbar circuit [wikipedia.org] he ran out of time and came to while munching on phattened feasant at Medieval Times, in a daze of King Arthur. He noticed an unused carrion bit, and realized that birds of prayer who managed the King's affairs were hard-sinewed to pluck quills for signing and always discarded the carrion bit. He caught the underflow was heralded by the people and befriended by the King, who set him to work hacking the Code of Chivalry and cracking the Y1K problem. In that time there were only punch cards and knights on horseback only had a resolution of 1 bit, so tournaments were long the fields were full of snakes, to avoid spooking the horses the knights would dismount and cleave them with sword, leaving half-adders strewn about. It was Pendragon who had built the famous Round Table with 12 seats, two complete I Chings, where Arthur and the knights would drop in and punch out binary sums in a rudimentary form of patty-cake, which inspired the mechanical circular adder [google.com] of later years. The Yankee's refinement was a 13th chair left unoccupied to mark the betrayal of Judas, and also to serve as a carrion bit.

      There is a great deal more about gum-powder and 99 cent gamut of Steampunk-driven micro commerce, a Debian release called 'Guinevere' and a whole lotta Lancelot, but time is fun when you're having flies.

  • How is losing power in an airplane a safe mode?
    • It's a failsafe mode for the controller and generator. There are four (4) of them. There is more than enough redundancy.
      • ... not when they would all have nearly the exact same runtime - they would all hit the failsafe at around the same time.

        Not that this should ever happen in the air - as others have said, if the thing manages to run for this long, someone hasn't been doing maintenance.

  • by Bomarc ( 306716 ) on Saturday May 02, 2015 @08:47AM (#49599917) Homepage
    For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there?

    I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
    Disclosure: Yes, I'm a software QA / Test professional.
    • Because these people are normally called TEST PILOTS. ;)

    • Re: (Score:2, Informative)

      by Anonymous Coward

      The Primary Flight Computer software for the 777 was written in England by GEC. Indeed the hardware for the PFC was designed and built by GEC.

      I was on the software QA team for the PFC code. There were tens of us working three shifts 24 hours per day devising tests of the PFC against it's requirement spec. There were even more doing unit tests on all the Ada code.

      That is perhaps why you don't see Boeing advertising for QA engineers. They outsource the hardware and software.

    • by fisted ( 2295862 )
      I suspect now would be the best time to apply at Boeing :-)
    • For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there? I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing? Disclosure: Yes, I'm a software QA / Test professional.

      The worst part is that when the Software bugs are finally discovered they are not fixed because it takes too much time and is too expensive to do (even though the physical update process is essentially no different to re-flashing/updating the firmware/software in a consumer grade digital device). I'd argue that you could cut the red security tape, reduce costs and install updates quicker if you massively increase the software QA work being done. Apparently Boeing disagrees, I dunno about Airbus, they might

      • the maps and other info get's updated quite a bit.

      • Remember that time back in the 90s when a Marine Corps plane on maneuvers knocked down a Cable Car in the Italian Alps, killing 20 people? That was partly because when they planned the maneuver they used charts that were 6 months old, and the cable car line was less then 6 months old. If they'd had iPads, or any other electronic chart equipment that automatically updated itself, it wouldn't have happened.

        Civil aviation changes just as frequently. Which approach each airport wants you to use (to avoid collid

    • by Required Snark ( 1702878 ) on Saturday May 02, 2015 @01:55PM (#49601435)
      You have no idea what you are talking about. All FAA certified aircraft software has to conform to the DO-178B [wikipedia.org] / DO-178C [wikipedia.org] standard. The standard imposes design, testing, process and documentation standards that are extremely demanding.

      QC isn't just a department or a step in the release process, it is built into the full life cycle of the software. Safety is the goal, and the requirement for good practice starts at the beginning of the process, with the requirement documents.

      For example, there are five levels of error severity defined from A to E. E has no impact on safety and A is catastrophic, where a crash could occur. The level of software test and validation depends on the severity level.

      The number of objectives to be satisfied (eventually with independence) is determined by the software level A-E. The phrase "with independence" refers to a separation of responsibilities where the objectivity of the verification and validation processes is ensured by virtue of their "independence" from the software development team. For objectives that must be satisfied with independence, the person verifying the item (such as a requirement or source code) may not be the person who authored the item and this separation must be clearly documented. In some cases, an automated tool may be equivalent to independence. However, the tool itself must then be qualified if it substitutes for human review.

      Your inability to find a "QC" position is because you don't know the structure of aerospace software development and have no idea of the job titles or terminology used to describe the standards used. You are projecting your lack of knowledge into a inconceivable lapse of competence on the part of Boeing and the FAA. In what universe would there be no software safety requirements for the civilian aircraft industry? All you have shown is that you are ignorant and have a basic lack of common sense.

  • by thisisauniqueid ( 825395 ) on Saturday May 02, 2015 @09:07AM (#49600003)
    The plane's control systems should have several levels of degraded-mode operation, so if one system stops working, the plane still hobbles along the best it can without the non-working system. Google's self-driving cars have something like 7 layers of nested failure modes, each with slightly degraded functions relative to the next higher level. It's almost impossible to trigger enough failures to completely shut the system down, which is a good thing if you're traveling at highway speeds. It's very concerning that a company like Boeing didn't catch this before product release, but even more concerning that they didn't design the system to be resilient against this sort of failure.
    • Indeed, they would need some mechanism like this, which is implemented using several heterogeneous processes. Triple hardware redundancy is useless if they all have a common mode software bug. Same thing happened to the first flight of Ariane 5 [around.com], where all 3 controllers crashed within milliseconds.
    • It is designed such that this would never be an issue. Why? Because you have to skip several critical maintenance periods to hit it. Imagine if you, somehow, kept your car engine running for two years. Ignoring the logistics of this, doing so means you cannot have changed your oil etc.

      Now, if it was on the order of 11 hours, that would be more of a concern.

  • ... "failsafe".

  • by 140Mandak262Jamuna ( 970587 ) on Saturday May 02, 2015 @09:12AM (#49600025) Journal

    The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.

    The spokesman continued, "The battery would have caught fire long before that integer overflow."

  • So first they say " left turned on for 248 days, it will enter a failsafe mode" then they say "all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost."

    That is NOT fail "SAFE". That is fail "EVERYBODY DEAD".

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      If you actually read the AD it will say "We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane."

      COULD lose control, not WILL. The 787 has at least 3 additional backup systems against this sort of failure, the APU, DC battery backup, and Ram Air Turbine.

    • Pilots lose control of planes all the time, just as drivers do. The question is a) how long are you out of control and how hard is it to get control back, and b) how likely is the scenario in the first place?

      In this case the answer seems to be a) not very long and not very hard as there are backup generators allowing you to reset the computers, and b) not bloody likely because you have to skip quite a few (as in dozens) routine maintenance cycles on these generators to get to 248 days without a restart.

      Ther

  • What a profound demonstration of the Halting Problem [wikipedia.org].

  • Enough of this (Score:5, Informative)

    by confused one ( 671304 ) on Saturday May 02, 2015 @09:43AM (#49600157)

    This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...

    248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.

    This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.

    • by PPH ( 736903 )

      This is not the OMG Planes Are Going to Fall From The Sky!

      No. This is a "What the f* were you goofballs thinking when you wrote this code? And if this is all the better you can do, what other gotchas are hiding in there?"

      • Dude, this is a for-profit company, not a research university. It's not written by people whose entire job is to prove to the world they write the most robustest code ever designed with zero bugs. If it doesn't kill people or delay flights it doesn't cost them money and nobody, except computer geeks, gives a shit.

        In this case the Dreamliner's designed to have all the relevant systems turned off for routine maintenance once every two weeks. Which means if they go more then 248 days without being restarted th

    • Comment removed based on user account deletion
    • Even though this bug isn't a direct threat, it could interact with other future software changes. If it is a counter overflow there is a risk that the counter would run at a higher rate in some future version where more functionality is needed. If 248 days went to 2.48 days, it might not be caught in testing, but could (rarely) happen in real life.

      • Bingo.

        If this was only spotted recently in "lab testing" (and why was it being tested now, and not before flight... what prompted the testing...) then it was known / not documented that overflow of this counter would cause shutdown. Some future revision could easily be to increase the precision, at the expense of range, or persist the counter across reboots, and that might not be considered a problem because the system was thought to handle the counter overflowing because no one documented that it didn't.

        T

  • I kind of did a double take when I saw the title. The book/series starts out with a brand new Boeing 777 losing power on the run way. :)

  • Sounds very safe.
    • by PPH ( 736903 )

      Don't worry. The 787 can always fail over to battery power ...... Umm, oh, oh.

  • ...does it display the Windows 95 splash screen?

  • How many times do I have to tell you to shut the plane the f off before you go to bed? What do you think, I'm made of money?!?

  • failsafe: I don't think that means what you think it means.
  • ...some witty remark about airplanes and downtime.

  • Would this ever happen in normal operation? I would think that every few hundred hours of flight time the plane would be pulled out of service for maintenance where everything would be shut down for a couple of days.

    • by Z00L00K ( 682162 )

      I agree here - the maintenance is probably interrupting the uptime of the system. Any airline that have an uptime of their aircraft for 248 days is likely to suffer other problems as well with their vessels, not only software glitches but also general wear issues.

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...