Software Bug Caused Qantas Airbus A330 To Nose-Dive 603
pdcull writes "According to Stuff.co.nz, the Australian Transport Safety Board found that a software bug was responsible for a Qantas Airbus A330 nose-diving twice while at cruising altitude, injuring 12 people seriously and causing 39 to be taken to the hospital. The event, which happened three years ago, was found to be caused by an airspeed sensor malfunction, linked to a bug in an algorithm which 'translated the sensors' data into actions, where the flight control computer could put the plane into a nosedive using bad data from just one sensor.' A software update was installed in November 2009, and the ATSB concluded that 'as a result of this redesign, passengers, crew and operators can be confident that the same type of accident will not reoccur.' I can't help wondering just how a piece of code, which presumably didn't test its input data for validity before acting on it, could become part of a modern jet's onboard software suite?"
Re:What about Google driverless car? (Score:5, Interesting)
There were people against airbags, too, because they killed some people who otherwise wouldn't have died. You work on fixing those things. But whether the system as a whole is worthwhile is judged on whether it saves more than it kills.
don't just wonder, learn (Score:5, Interesting)
"I can't help wondering just how could a piece of code, which presumable didn't test its' input data for validity before acting on it, become part of a modern jet's onboard software suit?""
How about reading the darned final report, conveniently linked in your own blurb? There was lots of validity checking. In fact, some of it was relatively recently changed, and that accidentally introduced this failure mode (the 1.2-second data spike holdover). (Also, how about someone spell-checking submissions?)
it's more complicated than that (Score:3, Interesting)
we're going to see a huge change in programming methods coming pretty soon. Today, A.I. is still math and computer based. The problem is that data, input, and all of the algorithms you're going to write can result in a plane nose-diving -- even though no human being has ever chosen to nose-dive under any scenario in a commercial flight.
Why was an algorithm written that could do something that no one has ever wanted to do?
The shift is going to be when psychology takes over A.I. from the math geeks. It'll be the first time that math becomes entirely useless because the scenarios will be 90% exceptions. It'll also be the first time that psychology becomes truly beneficial -- and it'll be the direct result of centuries of black-box science.
That's when the programming changes to "should we take a nose-dive? has anyone ever solved anything with a nose-dive? are we a fighter jet in a dog fight like they were?" Instead of what is it now: "what are the odds that we should be in a nose-dive? well, nothing else seems better."
Re:What about Google driverless car? (Score:5, Interesting)
It's so interesting to see people's reaction to the whole driver-less car thing. It's incredible to see the kind of ethical thought-experiment that must necessarily go through everyone's mind when they come to this conclusion: How many lives must be saved before I will tolerate someone being brutally slain by a malfunctioning computer?
Every day, children are run down by drivers who are not paying attention, tired, drunk, or just plain don't have time to react. Since a driver-less car is incapable of being drunk, tired, or distracted, then it's a safe bet that they'll be much better at avoiding those accidents that can be avoided. But the reality is that the latter scenario (no time to react) would still lead to the deaths of many children (and others!).
At what point does it become "worth it"? When the driver-less car causes 1/10th as many fatalities? 1/100th? 1/1,000th? How many human deaths must be prevented by letting computers drive cars before we're willing to accept 1 single death by those same computers?
It's a real-life example of the "Trolley Problem"
http://en.wikipedia.org/wiki/Trolley_problem [wikipedia.org]
Re:What about Google driverless car? (Score:4, Interesting)
Re:don't just wonder, learn (Score:5, Interesting)
Mod parent up. Anyhow, information from a sensor may be valid but inaccurate. I deal with these types of systems regularly(not in aircraft, but control systems in general), and it is sometimes impossible to tell with out extra sensors. Its one thing to detect a "broken wire" fault, and a completely different thing to detect a 20% calibration fault, for example, so validity checking can only take you so far. Its actually impressive the failure mode in this case caused so little damage.
Re:it's more complicated than that (Score:5, Interesting)
yup. all the while forgetting that the while altimeter shows altitude, it rarely actually measures distance to the ground, it measures air pressure, and then assumes an aweful lot.
Re:What about Google driverless car? (Score:5, Interesting)
Re:don't just wonder, learn (Score:5, Interesting)
Agreed, valid but inaccurate.
Though such an airliner will have more than one air speed sensor, no? Relying for such a vital piece of information on just one sensor would be crazy. And that makes it to me even more surprising that a single air speed sensor to malfunction causes such a disaster. But then it's the same kind of issue that's been blamed on an Air France jet crashing into the ocean - malfunctioning sensors, in that case ice buildup or so iirc, and as all sensors were of the same design this caused all of them to fail.
Another thing: I remember that when Airbus introduced their fly-by-wire aircraft, they stressed that one of the safety features to prevent problems caused by computer software/hardware bugs, was to have five different flight computer systems built and designed independently by five different companies, using different hardware. So that if one computer has an issue causing it to malfunction, the other four computers would be able to override this. And a majority of those computers should agree with one another before an airplane control action would be undertaken.
Re:What about Google driverless car? (Score:4, Interesting)
A million people die annually because of human drivers. A driverless car killing half that many would still be an improvement.
When a human driver kills another human being, the courts can punish that person and allow for the victim's family to claim compensation.
When a driverless car kills a human being... ?
Maybe we could copy the system we have for vaccines [wikipedia.org]
Re:it's more complicated than that (Score:2, Interesting)
there is absolutely no way that your brain does calculus in order to walk around an obstacle. yet that's exactly what's taught in today's AI class.
it's not about probability. you don't grasp a glass by determining how much pressure you can apply to it based on its chemical structure. you add more pressure until it stops slipping through your hand.
you trust nothing, and you draw conclusions only through on-the-fly experimentation.
you computer pilot was not supposed to use the sensor for anything but convenience. the moment it says something unexpected, the computer was to determine the altitude in a proper way -- which takes longer. and don't tell me that it had no redundancy. it could have done EXACTLY what the pilot would have done.
ask somebody.
not assume it's alone in the world.
hey pilot, hey tower, where am I?
Re:What about Google driverless car? (Score:4, Interesting)
You hold the people who made the car responsible? They better analyze the hell out of every single tiny problem that crops up and make details and fixes public. This is why all these driverless softwares must be open source. Any 'benefits' of making it proprietary would come at the cost everyone's safety.
And besides, it doesn't really matter how someone is punished for wrongdoing. You judge whether it's an improvement or not; you don't judge on how best to get retribution. Otherwise, you could hypothetically end up choosing a system that causes a lot of problems as long as it's easy to blame someone for causing them.
Re:What about Google driverless car? (Score:2, Interesting)
One of the AF447 pilots killed the others, and the passengers.
Also, I am shocked that the Airbus sidesticks don't move in unison. If the left seat stick had gone all the way back when the right seat pilot pulled on it, then the left seat pilot would have immediately understood and dealt with the situation. Every other airplane works like this - why don't Airbuses?
Comment removed (Score:5, Interesting)
I had the problem once (Score:5, Interesting)
Posting anon because I moderated.
I had a very similar problem once with firmware on a TI DSP. The symptom was that a peltier element for controling laser temperature would sometimes freak out and start burning so hot that the solder melted. After some debugging, it turned out that somewhere between the EEPROM holding the setpoint, and the AD converter, the setpoint value got corrupted.
The cause turned out to be a 32 variable that was un-initialized, but always set to 0 by the stack initialization code.
Only the first 16 bits were filled in because that was the value stored in the EEPROM. The programming bug was that the other 16 bits were left as is. In >99% of the time, this was not a problem. But if a specific interrupt happened at exactly the wrong moment during initialization of the stack variable, that variable was filled with garbage from an interrupt register value. Since the calculations for the setpoint used the entire 32 bits (it was integer math) it came out with a ridiculously high setpoint.
Having had to debug that, I know how hard it can be if your bug depends on what is going on inside the CPU or related to interrupts.
There may only be a window of less a micro second for this bug to happen, so reproduction could be nigh on impossible.
Re:What about Google driverless car? (Score:5, Interesting)
It's not about choosing one or the other, but hybrid systems operating at the same time.
If you are going to compare quality, the human will win every time. We can give anecdotal evidence about how bad drivers are, but statistics show that driving is not so dangerous that we need to consider stopping it altogether. Really think about it for a second. During your average day, how many really bad drivers did you personally interact with that created a dangerous situation resulting in an accident? Pretty low huh? I would expect so, otherwise insurance would cost thousands and thousands per month, instead of per year.
Humans are not the inferior solution overall right now. Not by far.
It is also not because Google is not perfect either. Specifically, it is because of the time required, and the complexity of shifting control from Google to the driver. Once such a system becomes normal to a driver, their attention is not going to be on the road, but on their interaction with other devices. You cannot reasonably expect a person to be in complete awareness, hands at 10-2, ready in a split second to take control. You would get too bored without immediate feedback, your mind would drift. This would be completely normal too.
This is not to say that the system itself might not be useful, but it would have to be under very controlled conditions excluding human drivers altogether. It could work, provided the shifting of control was at a controlled rate in relatively controlled conditions. Give the human being time to adapt and obtain situational awareness.
As cool as this sounds, it is just not ready to fully replace a human, unless it could perform at a human level or better. The dream of a car that can drive itself completely under all conditions is still some ways away.
The idea of changing carpool lanes over to high efficiency lanes where human control is not allowed seems like a more pragmatic approach that decreases the complexity and uncertainty that the Google system has to deal with. It has very high value as well since it can optimize traffic patterns far better than a human simply because it can cooperate with a much larger number of cars over greater distances. A human could never hope to do that with our inherent limitations.
That system could realize some serious fuel savings and increase productivity by essentially mimicking an airplane in auto pilot mode. The human is really just there to get the system to the point where it can safely transition in and out of a computer controlled lane. That will be extremely advantageous to overall traffic.
Re:What about Google driverless car? (Score:4, Interesting)
The Airbus will also change the throttle to the engines without moving the throttle levers whereas the Boeing will move the levers to where the computer set the throttle, When the autopilot takes a crap and you put your hands on the throttle, you must remember that the controls are lying to you and act accordingly.
Re:What about Google driverless car? (Score:5, Interesting)
Back in my Finnish Air Force days I talked to a captain who had flown the F-18C in his last three active flight years. He told that when you're straight and level in the Hornet and peek over your shoulder you probably see the ailerons swaying back and forth as the computer tries to keep the plane stable.
Re:What about Google driverless car? (Score:4, Interesting)
I'd even take it further: I'd hand over my driving to an autumatic car in a second if it meant all the other morons would have to do the same.
For those addicted to driving: yes, I'd love to force you to take your self-driving to the circuit, where it belongs (once the driverless cars have proven to have less than half the accident rate of humans).
Re:What about Google driverless car? (Score:4, Interesting)
WEll as you dont know anything at all about flying let alone commercial pilots, let me inform you.
none of what you dream up is happening up there except for the bad pilots. i know several commercial pilots, they are busy up there with checklists, comms, and do not have time to chat up the stewardesses. There was a little thing that happened in September of 2001, they keep the door locked for the flight.
But you go ahead with your fantasy, it's just like all the fantasy reported on fox news.
Re:This is why I like fuzzing (Score:5, Interesting)
True ... but you may not ever have enough time to hit all the corner cases.
If it's a single 32-bit word, that can cause the issue, then yes, you can go through every single permutation fairly quickly. There are only 4,294,967,296 of them - nothing that a computer can't handle.
Suppose for a moment that the issue is caused, not by one single faulty piece of data, but two right after each-other. Essentially a 64-bit word causes the issue. Now we're looking at 18,446,744,073,709,551,616. Quite a bit more, but not impossible to test.
Now suppose that the first 64-bit word doesn't cause the fault on its own, but "simply" causes an instability in the software. That instability will be triggered by another specific 64-bit word. Now we're looking at 3.40282367 x 10^38 permutations.
Now, keep in mind that at this point, we're really looking at a fairly simple error triggered by two pieces of data. One sets it up, the other causes the fault.
Now let's make it slightly more complex.
The actual issue is caused by two different error conditions happening at once. If they are similar as above, we're now looking at, essentially, a 256-bit word. That's 1.15792089 x 10^77 permutations.
In comparison, the world's fastest super computer can do 10.51 petaflops, which is 10.51 x 10^15, and it would take that computer 0.409 microseconds to go through all permutations in a 32 bit word. About 30 minutes for a 64 bit word. 10^15 years for a 128 bit word and 10^53 years for a 256 bit word.
Yes, you can test every single permutation, if the problem is small enough. But the problem with most software is that it really isn't small.
Even if we are only talking 32 bit words causing the issue, will it happen every time that single word is issued, or do you need specific conditions? How is that condition created? As soon as the issue becomes even slightly complex, it becomes essentially impossible to test for.
Re:What about Google driverless car? (Score:4, Interesting)
*blinks* You're not well versed in the effect of turbulence on localized airspeed or altitude are you? The sensors will report airspeeds that are only possible in a dive, combined with the loss of altitude even though the angle of attack is level or steady could easily cause software to attempt to pull out of the "dive". That assumes that the plane is allowed to override human input, which is a seriously fucking asinine design if true.
Re:What about Google driverless car? (Score:4, Interesting)
So how will you reduce the risk of someone next to you suddenly deciding to switch the lanes without checking that you're there? How do you reduce the risk of someone deciding he just has to pass the car in front of him even when there's incoming traffick?
Um...by not riding beside somebody, especially in their blind spot?
I mean, is this a serious question? Have you never learned defensive driving?
Avionics certification (Score:4, Interesting)
If you saw the procedures required to get airworthiness certification from the FAA for a critical piece of software, you would shake your head in disbelief. It is almost all about ensuring that every line of code is traceable to (and tested against) a formal requirement somewhere. In spite of greatly increasing software development costs (due to the additional documentation and audit trails required), the procedures do amazingly little to ensure that the requirements or code are actually any good, or that sound software engineering principles are employed. It does not surprise me that GIGO situations occasionally arise -- it is perfectly plausible that a system could meet the certification criteria but shit's still busted because the formal requirements didn't completely capture what needed to happen.
The cost of compliance can also warp the process. A co-worker once told me a story about an incident that happened years ago at a former employer of his. A software system with several significant bugs was allowed to continue flying because the broken version had already received its FAA airworthiness certification. A new version which corrected the bugs had been developed, but getting the new version through the airworthiness certification process again would've been too costly; so the broken version was allowed to continue flying.
Look up "DO-178B" sometime if you're curious...