Long Uptime Makes Boeing 787 Lose Electrical Power 250
jones_supa writes: A dangerous software glitch has been found in the Boeing 787 Dreamliner. If the plane is left turned on for 248 days, it will enter a failsafe mode that will lead to the plane losing all of its power, according to a new directive from the US Federal Aviation Administration. If the bug is triggered, all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost. Boeing is working on a software upgrade that will address the problems, the FAA says. The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
Oh come on. (Score:2)
Re:Oh come on. (Score:5, Interesting)
Re: (Score:3)
Oh, so they can make it fine for 497.10 days by changing the type to unsigned!
Re:Oh come on. (Score:4, Informative)
Which is apparently what Windows does:
https://www.ctm-it.com/it-supp... [ctm-it.com]
You'd think they would have learned since Windows 95/98 did the same thing.
https://support.microsoft.com/... [microsoft.com]
But hey, at least it goes 10 times as long now.
Re: (Score:3)
Re:Oh come on. (Score:4, Informative)
Of course, either is often undesired, but the latter at least doesn't allow basically anything to happen.
Re:Oh come on. (Score:5, Funny)
And this is why C should never be used for mission critical software.
Re: (Score:3)
Re: (Score:3)
It doesn't matter what country programmers come from, in my experience too many programmers have no clue about reality outside of their cube. They are building software for things they do not understand. I am going to rant about this in another thread so I will leave it at that for now.
Have you tried turning it off and on again? (Score:5, Funny)
Finally!
IT support advice that's useful!
Re:Have you tried turning it off and on again? (Score:4, Interesting)
Yes, but perform a clean systems shut down BEFORE turning off power.
I was on an airliner once that crashed at the gate, prior to departure.
Ground power was disconnected before they had spun up the APU. Lights out. Lights on. ... Several minutes later we get an announcement that we'd have to wait for a backup plane, which took 45 minutes to arrange.
They were unable to reboot the airliner.
Robust systems design wasn't a phrase that came to mind.
This is Boeing Tech Support (Score:5, Funny)
Re: (Score:2)
The first time I was on a new plane where the pilots did that at the gate to "fix a computer glitch" (~1998) I was utterly terrified.
sPh
Re: (Score:2)
I remember that in 1996 the pilot of a FBW aircraft has say that one computer displayed an error, then there restarted the computer and the error was not displayed again, so all is nominal and we can go. He didn't detailed the error displayed. Maybe this was minor, maybe not. The 6 hours fly was without any problem.
Re: (Score:2)
"have you tried turning it off and then back on?"
True story: Back in the early 1980s, I actually had a long-distance phone call with someone in which I was the "tech support" part of the above conversation. ... Me: "Are you sitting in front of the PC? Lean to your right... See that big red switch at the back of the case? ..."
Re: (Score:2)
"Thank you for calling Boeing tech support. Did you most of your questions can be answered online in our FAQ section? Simply go to www.boeingcares.com/customercare/support/FAQ. If this is an ground problem press 1, if it is a maintenance question please press 2, if this is about the galley hotbox recall press 3, for in-flight problems press 4"
*beep*
"You have selected in-flight problem. For engine fires press 1, for structural failure please press 2, for fuel system faults please check 3, for all other in-fl
Please reinstall (Score:2)
Please format the drive and res-install windows.
Re:This is Boeing Tech Support (Score:4, Funny)
Re: (Score:2)
How about CNTL-ALT-DEL?
Yup. Reboot the plane every time it's at the gate.
Very unlikely to be triggered in the field (Score:3, Informative)
Re:Very unlikely to be triggered in the field (Score:5, Insightful)
A commercial plane will most probably undergo through several maintenance events and checks during that sort of time frame, where cycling the power is part of the procedure.
It's very reassuring to know that it probably won't happen.
Re:Very unlikely to be triggered in the field (Score:5, Interesting)
Re: (Score:3)
If it ever happened on a plane, then it means that the maintenance was intentionally skipped.
And that would of course never happen.
Re: (Score:2)
The entire world isn't the US/Japan/EU. While most airlines outside that region who operate 787s run tight operations (Ethiopian for example is often mentioned as very well-run with a strong safety culture), there are a few who do not.
That said, in the few instances where less organized airlines have managed to acquired 787s they are probably being shut down 2-3 times/week much less every 9 months.
sPh
Re: (Score:2)
Hey, it could be possible on planes flying the Tripoli - Mogadishu - Kabul route!
Re: (Score:3)
If it ever happened on a plane, then it means that the maintenance was intentionally skipped. If they reach 248 days of continuous operation then a number of significant maintenance cycles have been skipped (some 23-25 inspection / maintenance cycles that generally require shutting down the electrical system). The generators in question are attached to the engines. The engines have a overhaul schedule that is shorter than 248 days of continuous operation. If they managed to reach this point, then the major maintenance cycles have been skipped and the engines are long overdue for a tear down inspection and overhaul. Any plane which could reach this point, 248 days of continuous operation missing all of the required maintenance; this is not a plane (or an airline for that matter) which anyone should be flying on.
You would think that if this situation was unlikely to ever happen in practice that the FAA wouldn't have deemed it necessary to issue an AD requiring that the GCUs be power cycled at intervals no longer than 120 days. You'd think they'd already be aware of required maintenance intervals that require powercycling the GCUs, and they waived the usual comment period before issuing the AD due to the perceived imminent danger.
Re: (Score:2)
Waiting 248 days on the tarmac before flight... Improbable. I hope.
Re: (Score:2, Funny)
You must not fly United.
Re: (Score:3)
You will probably not be struck by lightning, but I can't guarantee that it won't happen.
Actually, when talking about airliners, getting struck by lighting is a fairly common occurrence. A typical airliner experiences a lightning strike about once a year.
Re: (Score:2)
Is that a cold boot or a warm boot?
Re: (Score:2)
That's what I was thinking. I didn't look it up, but I'd be pretty sure that the maintenance interval is shorter than 5,952 hours.
Re: Very unlikely to be triggered in the field (Score:2)
Yes, but if your desktop fails it doesn't fall out of the sky.... most of the time
Re: (Score:2)
In an alternate timeline, Keith Moon found the 21st century to be full of challenges.
Lesson Here (Score:2)
Re: (Score:2)
Re: (Score:2)
If you did the math, you don't need excess space. If you need excess space, you're just shifting the day of failure into the future. Yes, perhaps far enough, but still.
What math would you do to determine exactly how high a counter should count?
Would using a 64-bit long on a millisecond counter be lazy programming?
Re: (Score:2)
* "imminent risk of overflowing" probably means less than four routine maintenance intervals remaining, but consult the requirements document for more detail.
This is aerospace, not gaming.
Re: (Score:3)
Re: (Score:2)
Good idea in principle, not always helpful in practice. I diagnosed an application failure on a UNIX system some years back that resulted from using the system's "time since last boot" function as a real-time clock with greater than one-second precision. We discovered that in order to prevent the terrible things that would happen if the 32-bit signed counter of 0.01-second intervals ever overflowed, the UNIX vendor had programmed the reported time to stop changing when it reached 2^31-1. Since the system pr
Re: (Score:2)
Always, always, always do the math on counters and give yourself orders of magnitude of space. Figured this out the hard way once (fortunately not in a situation where safety was a concern).
As far as I am concerned, there are three valid quantities in programming. Zero, one and unlimited.
Re: (Score:2)
Good luck using a float as a counter. It won't overflow, but will eventually stop counting.
The trick is knowing what you are doing. Which means erasing that 'three valid quantities' thinking.
Control unit runs at 100 Hz? (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2, Funny)
I call BS. No WIndows 98 machine could possibly stay up for 7 weeks, so this was a non-issue.
Re: (Score:3)
That makes a lot of sense. A lot of aviation power systems run with 400 Hz AC current (the higher frequency lets them use smaller transformers). They could be dividing down the power signal to 100 Hz, and using that to increment a counter.
The other option is that many operating systems use 10 ms = 100 Hz for their internal interrupt timers. So it could just be a counter that is being incremented every interrupt cycle, and doesn't care what frequency of electricity is being used.
(cf. the jiffy http://en.w [wikipedia.org]
Re:Control unit runs at 100 Hz? (Score:5, Funny)
I guess this might be due to a 32-bit signed integer being incremented at 100 Hz: 2^31 / 24 / 3600 / 100 = 248.5 days.
Yes, the moment the big bird would shut down was correctly prognosticated by the Connecticut Yankee in King Arthur's Court [gutenberg.org]. While testing a crowbar circuit [wikipedia.org] he ran out of time and came to while munching on phattened feasant at Medieval Times, in a daze of King Arthur. He noticed an unused carrion bit, and realized that birds of prayer who managed the King's affairs were hard-sinewed to pluck quills for signing and always discarded the carrion bit. He caught the underflow was heralded by the people and befriended by the King, who set him to work hacking the Code of Chivalry and cracking the Y1K problem. In that time there were only punch cards and knights on horseback only had a resolution of 1 bit, so tournaments were long the fields were full of snakes, to avoid spooking the horses the knights would dismount and cleave them with sword, leaving half-adders strewn about. It was Pendragon who had built the famous Round Table with 12 seats, two complete I Chings, where Arthur and the knights would drop in and punch out binary sums in a rudimentary form of patty-cake, which inspired the mechanical circular adder [google.com] of later years. The Yankee's refinement was a 13th chair left unoccupied to mark the betrayal of Judas, and also to serve as a carrion bit.
There is a great deal more about gum-powder and 99 cent gamut of Steampunk-driven micro commerce, a Debian release called 'Guinevere' and a whole lotta Lancelot, but time is fun when you're having flies.
Failsafe (Score:2)
Re: (Score:2)
Re: (Score:3)
... not when they would all have nearly the exact same runtime - they would all hit the failsafe at around the same time.
Not that this should ever happen in the air - as others have said, if the thing manages to run for this long, someone hasn't been doing maintenance.
If Boeing believed in software QA.... (Score:3)
I know of two other software "bugs"
Disclosure: Yes, I'm a software QA / Test professional.
Re: (Score:2)
Because these people are normally called TEST PILOTS. ;)
Re: (Score:2, Informative)
The Primary Flight Computer software for the 777 was written in England by GEC. Indeed the hardware for the PFC was designed and built by GEC.
I was on the software QA team for the PFC code. There were tens of us working three shifts 24 hours per day devising tests of the PFC against it's requirement spec. There were even more doing unit tests on all the Ada code.
That is perhaps why you don't see Boeing advertising for QA engineers. They outsource the hardware and software.
Re: (Score:3, Informative)
The reason for the three shifts was that we were using actual PFC computers connected to hardware that could simulate all the inputs and read all the outputs.
That hardware was a big complicated rack of electronics and there were maybe 8 or 10 such units in a lab.
As such, to optimize use of the facilities it was necessary to have three shifts 24 hours per day. This went on for a year or more.
Very good planning in fact.
Now I could tell you stories of the real corners cut to meet the schedule. But that's a com
Re: (Score:2)
I have seen it happen all too often, an unrealistic development schedule is made to get the contract and as shit rolls down the schedule, QA takes the brunt of any deadlines problems. It's one thing if your developing software for insurance companies, it's another when it's aerospace.
Until recently I was the software QA Director for a software company. I completely agree with your assessment. My company used to pad in about a week for QA. I kept telling them, that it might take us a week to QA, but if we find any issues, then it will have to go back to development, and I can't speak for how long that would take. They really didn't like how i couldn't give them a solid date, but how could I speak for how long it would take another department to fix something?
At any rate, all of that wa
Re: (Score:2, Insightful)
Actually I took my work there testing the 777 software very seriously.
On at least two occasions I escalated what I thought was a problem in the specification all the way back to Boeing. One of them turned out to be a "real-world" issue in the spec.
I believe the rest of the team took the same attitude. We used to talk about that a lot.
At the end of the day what you are asking for is impossible. The spec we worked to was a stack of paper 2 yards high when printed out. How many QA engineers know enough about
Re: (Score:2)
Testing is about checking compliance to a spec. QA is a *much* broader topic. E.g. reviewing the spec to ensure it was well writing and there are no gaps. Unfortunately many people do not understand the difference.
Re: (Score:2)
Re: (Score:2)
For all of the QA at Boing; they don't believe in software QA. Take a look at their job openings some time: In years of searching, I've seen only one software QA position, and it wasn't dealing with aircraft. Any such search results will return developers that are to write their own tests against the spec. Developers are not Testers.... and I'll ask: How many more such bugs are out there? I know of two other software "bugs" ... that can be attributed to a lack of QA. How many people will die due to a bad management decision on the part of Boeing?
Disclosure: Yes, I'm a software QA / Test professional.
The worst part is that when the Software bugs are finally discovered they are not fixed because it takes too much time and is too expensive to do (even though the physical update process is essentially no different to re-flashing/updating the firmware/software in a consumer grade digital device). I'd argue that you could cut the red security tape, reduce costs and install updates quicker if you massively increase the software QA work being done. Apparently Boeing disagrees, I dunno about Airbus, they might
Re: (Score:2)
the maps and other info get's updated quite a bit.
Re: (Score:2)
Remember that time back in the 90s when a Marine Corps plane on maneuvers knocked down a Cable Car in the Italian Alps, killing 20 people? That was partly because when they planned the maneuver they used charts that were 6 months old, and the cable car line was less then 6 months old. If they'd had iPads, or any other electronic chart equipment that automatically updated itself, it wouldn't have happened.
Civil aviation changes just as frequently. Which approach each airport wants you to use (to avoid collid
Re:If Boeing believed in software QA.... (Score:5, Informative)
QC isn't just a department or a step in the release process, it is built into the full life cycle of the software. Safety is the goal, and the requirement for good practice starts at the beginning of the process, with the requirement documents.
For example, there are five levels of error severity defined from A to E. E has no impact on safety and A is catastrophic, where a crash could occur. The level of software test and validation depends on the severity level.
Your inability to find a "QC" position is because you don't know the structure of aerospace software development and have no idea of the job titles or terminology used to describe the standards used. You are projecting your lack of knowledge into a inconceivable lapse of competence on the part of Boeing and the FAA. In what universe would there be no software safety requirements for the civilian aircraft industry? All you have shown is that you are ignorant and have a basic lack of common sense.
Re: (Score:2)
Wasn't it Boeing QA that discovered this flaw in the first place?
After the code was already released, and is already being used in the field. As opposed to before release, as part of a responsible code review.
Graceful degradation (Score:3)
Re: (Score:2)
Re: (Score:2)
It is designed such that this would never be an issue. Why? Because you have to skip several critical maintenance periods to hit it. Imagine if you, somehow, kept your car engine running for two years. Ignoring the logistics of this, doing so means you cannot have changed your oil etc.
Now, if it was on the order of 11 hours, that would be more of a concern.
Give a whole new meaning to ... (Score:2)
... "failsafe".
It is probably a non-issue. (Score:5, Funny)
The company is said to have found the problem during laboratory testing of the plane, and thankfully there are no reports of it being triggered on the field.
The spokesman continued, "The battery would have caught fire long before that integer overflow."
What idiot doesn't know what "failsafe"means? (Score:2)
So first they say " left turned on for 248 days, it will enter a failsafe mode" then they say "all the Generator Control Units will shut off, leaving the plane without power, and the control of the plane will be lost."
That is NOT fail "SAFE". That is fail "EVERYBODY DEAD".
Re: (Score:2, Insightful)
If you actually read the AD it will say "We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane."
COULD lose control, not WILL. The 787 has at least 3 additional backup systems against this sort of failure, the APU, DC battery backup, and Ram Air Turbine.
Re: (Score:2)
Pilots lose control of planes all the time, just as drivers do. The question is a) how long are you out of control and how hard is it to get control back, and b) how likely is the scenario in the first place?
In this case the answer seems to be a) not very long and not very hard as there are backup generators allowing you to reset the computers, and b) not bloody likely because you have to skip quite a few (as in dozens) routine maintenance cycles on these generators to get to 248 days without a restart.
Ther
Halting Problem (Score:2)
What a profound demonstration of the Halting Problem [wikipedia.org].
Enough of this (Score:5, Informative)
This story is being way overblown. Yes, it's a bug. Yes, it should be fixed. However...
248 days of continuous operation is well past the scheduled major maintenance for the aircraft. By this point, a 787 would have to go through many minor maintenance cycles which would have required shutting down the electrical system. In addition, loss of all 4 generators would not result in a loss of vehicle because there are batteries, an APU (a backup generator) and Ram Air Turbines (RATs), generators that deploy from the wing if the APU won't start. To have to rely on any of these would not make for a good day for the pilots; but, they would certainly provide the necessary power to safely land the aircraft at the nearest airport. They might even be able to continue on and finish their flight if they successfully reset the generators.
This is not the OMG Planes Are Going to Fall From The Sky! event the media is making it out to be.
Re: (Score:3)
This is not the OMG Planes Are Going to Fall From The Sky!
No. This is a "What the f* were you goofballs thinking when you wrote this code? And if this is all the better you can do, what other gotchas are hiding in there?"
Re: (Score:3)
Dude, this is a for-profit company, not a research university. It's not written by people whose entire job is to prove to the world they write the most robustest code ever designed with zero bugs. If it doesn't kill people or delay flights it doesn't cost them money and nobody, except computer geeks, gives a shit.
In this case the Dreamliner's designed to have all the relevant systems turned off for routine maintenance once every two weeks. Which means if they go more then 248 days without being restarted th
Re: (Score:2)
Re: (Score:3)
Even though this bug isn't a direct threat, it could interact with other future software changes. If it is a counter overflow there is a risk that the counter would run at a higher rate in some future version where more functionality is needed. If 248 days went to 2.48 days, it might not be caught in testing, but could (rarely) happen in real life.
Re: (Score:3)
Bingo.
If this was only spotted recently in "lab testing" (and why was it being tested now, and not before flight... what prompted the testing...) then it was known / not documented that overflow of this counter would cause shutdown. Some future revision could easily be to increase the precision, at the expense of range, or persist the counter across reboots, and that might not be considered a problem because the system was thought to handle the counter overflowing because no one documented that it didn't.
T
I've been reading The Strain Trilogy (Score:2)
I kind of did a double take when I saw the title. The book/series starts out with a brand new Boeing 777 losing power on the run way. :)
Failsafe mode? (Score:2)
Re: (Score:2)
Don't worry. The 787 can always fail over to battery power ...... Umm, oh, oh.
and when it boots, (Score:2)
Damn kids! (Score:2)
How many times do I have to tell you to shut the plane the f off before you go to bed? What do you think, I'm made of money?!?
failsafe (Score:2)
Long uptime... (Score:2)
...some witty remark about airplanes and downtime.
Real situation? (Score:2)
Would this ever happen in normal operation? I would think that every few hundred hours of flight time the plane would be pulled out of service for maintenance where everything would be shut down for a couple of days.
Re: (Score:2)
I agree here - the maintenance is probably interrupting the uptime of the system. Any airline that have an uptime of their aircraft for 248 days is likely to suffer other problems as well with their vessels, not only software glitches but also general wear issues.
Re: Maybe they should have used Rust. (Score:2)
Re: (Score:2)
What mechanism does Rust use to prevent 32-bit counter overflows?
Re: (Score:2)
Maybe they should have used Rust.
They can't use rust, because they build with a minimum of Ferrous materials. They have to wait for the fork, AlOx.
Re: (Score:3)
This is a prime example of why we need to use the Rust programming language ... blazingly ... eliminates data races ... guaranteed memory ... threads ... greatest minds ... the great ... the superb ... the glorious ... the mightiest ... Git ... Hub ... ... properly ... where it's at ... what we need ... It's what [the world] need[s] now.
Oh yeah? Sheeeit.
Pump it up! (endorsed by M.I.A.) [youtube.com].
Ericsson Calling!
Speak the Erlang [wikipedia.org] now (Seattle boys say Wha? Penguin Girls say Wha-What [x2]
Use Erlang Erlang Erlang, Ga la ga la ga la Land ga Lang ga Lang
Con-currency get you down?
Stack em flat, get down get down
Too late you down D-down D-down D-down
Ta na ta na ta na Ta na ta na ta
Bench mark a-blaze Erlang a lang a lang lang
Eager evaluation Erlang a lang a lang lang
Single assignment Erlang a lang a lang lang
Dynamic typing Erlang a lang a lang lang
Who t
Re: (Score:2)
Re:queue the.. (Score:5, Informative)
Re: (Score:2)
Yeah, I don't have my 'back of the envelop' calculations in front of me but I think I worked that out to be a 'timedGetTime' rollover bug. I wonder if the same thing is happening in this case, i.e. a timer rollover bug.
Re: (Score:2)
Re: (Score:3)
"(psshsquawk)This is the Captain speaking, we are cruising at 30,000 feet, have a bit of a tail wind and will be in San Francisco a little ahead of schedule. ...Ummm... Ah.... I'm putting the seatbelt sign on now. Please return to your seats as we reboot the airplane.(pssshsquawk)"
Re:queue the.. (Score:5, Informative)
Only theoretical, though. Windows 9x would crash long before reaching this uptime.
Re: (Score:2)
Those have mostly been unfair since the NT-derived era; but, in the spirit of the joke, there was a bug in win95 and 98 [cnet.com] that would cause the system to crash after 49.7 days of uptime. It remained undiscovered for years.
Re: (Score:2)
That's why I always use unsigned integers like a boss.
Re: (Score:2)
The Boeing Screamliner -- the proud product of innovative Project Management in a Globalized Economy
I thought the name was the Dreamliner? Yup. It is, according to the Boeing website. No mention of it being called the screamliner. But you are right about it living up to it's name. No accidents or injuries have occurred on a 787 in all of it's years of service.
Re: (Score:3)
All three of them.
Hey, 248 days is five dog-years.