Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Bug Networking Transportation

Dublin Air Traffic Control Brought Down By Faulty NIC 203

Not so very long ago after passengers were left hanging by a similar glitch at LAX, Gilby4mPuck writes with another story of NIC failure leading to a disruption of air traffic, this time in Ireland, excerpting: "Data showing the location, height and speed of approaching planes disappeared from screens for 10 minutes each time. ... Thales ATM stated that in 10 similar air traffic control Centres worldwide with over 500,000 flight hours (50 years), this is the first time an incident of this type has been reported. ... '[They] confirmed the root cause of the hardware system malfunction as an intermittent malfunctioning network card which consequently overcame the built-in system redundancy,' said an IAA spokeswoman."
This discussion has been archived. No new comments can be posted.

Dublin Air Traffic Control Brought Down By Faulty NIC

Comments Filter:
  • Re:testing and QA (Score:4, Informative)

    by MortenLJ ( 686173 ) on Friday July 18, 2008 @03:58AM (#24238959)
    The possiblity of failure can be reduced, but never completely removed. It's a simple matter of probabilities. E.g. a certain component fails on any day with probability p, if we add n redunndant fail-overs, the total system will fail with probability 1-p^n, an equation which will never be one, but it can get close.
  • Re:ten minutes (Score:5, Informative)

    by wintermute000 ( 928348 ) <{ua.moc.sserpxetenalp} {ta} {redneb}> on Friday July 18, 2008 @04:49AM (#24239249)

    there are plenty of examples of 10 minute failover

    Older cisco ATAs take 10 minutes to swing onto SRST if keepalives are lost to the callmanager cluster.

    a complex routing protocol refresh (big BGP networks) can take many minutes

    a faulty NIC can easily bring down a LAN segment, with or without redundant switching paths - and it makes it look like a router failure as the router overloads trying to deal with the broadcast storm

  • Re:testing and QA (Score:3, Informative)

    by diskis ( 221264 ) on Friday July 18, 2008 @05:17AM (#24239419)

    Air traffice towers generally are not noisy or dusty. And in any case, disregarding the ports, the NIC card itself is practically eternal. Compared to the rest of the system, and the lifetime of the system that is.

    Two lessons learned from years of technical support. The NIC isn't broken, unless the computer has been dragged from the network cable. And that the CPU is not broken as long as the system has not been overclocked, and the heatsink is still in place.

  • Re:testing and QA (Score:3, Informative)

    by Phroggy ( 441 ) <slashdot3@@@phroggy...com> on Friday July 18, 2008 @05:21AM (#24239441) Homepage

    The problem is, NICs can fail in all kinds of ways that yanking cables won't simulate. In this case it sounds like if they had yanked the cable, the backup system would have come online exactly like it was supposed to, but because the faulty NIC was kinda-sorta-almost-but-not-really working, it didn't. That's a difficult thing to test in the lab.

  • by ddrichardson ( 869910 ) on Friday July 18, 2008 @05:43AM (#24239565)

    I work in aviation and wonder if the terminology being used by the newspaper articles is correct.

    It appears to be talking about mode S IFF (Interrogation Friend or Foe) or SIFF radar systems which identify aircraft and appends height data. The speed is the only thing that needs calculating, as it isn't encoded in the pulse train.

    Why this is weird is because much older bus technologies are normally used to handle this data being transferred than current network technology, such as MIL-STD-1553 [wikipedia.org].

    This makes me wonder if it was one of two things - a system inputing to an ethernet PC system that calculates and displays the information or more likely they are talking about a DLTU type stub connector (or remote terminal) used in such typical buses. This is unlikely because the bus systems they are employed on, the bus controller would have picked up on the failure during continuous built in test and pulled in an alternative.

    If its the former then someone needs shooting. ATC is a realtime application and the overhead involved here would be unacceptable. I'm not even sure of the benefit of a network, multiple self contained indiviual terminals would be safer.

  • Re:testing and QA (Score:3, Informative)

    by putaro ( 235078 ) on Friday July 18, 2008 @06:10AM (#24239677) Journal

    That was in the movie. Read the book, it's much better.

  • Re:testing and QA (Score:4, Informative)

    by david.given ( 6740 ) <dg@cowlark.com> on Friday July 18, 2008 @06:27AM (#24239729) Homepage Journal

    Actually, it confers "the ability to fold space. That is, travel to any part of universe without moving."

    Actually actually, the space folding is done using the Holtzman drive, which is a perfectly ordinary machine. The Navigator merely navigates, plotting a safe path through the non-space/time foldspace. The spice grants the Navigator the limited prescience required to do this.

    Eventually the Navigators become obsolete, replaced by Ixian semisentient machines known as Compilers that perform the same task without needing melange. A good thing too, because by that point Arrakis is rubble and sandworms are pretty much extinct.

    Details courtesy of Wikipedia (and my lack of a social life).

  • How we do it (Score:1, Informative)

    by Anonymous Coward on Friday July 18, 2008 @10:58AM (#24242523)

    I work in a company that makes stuff for ATC. Our systems have 2 networks (2 NICs in each box, 2 sets of cables, switches etc)

    Every packet is sent on both networks and alarms are set off if a packet turns up somewhere without being received on the other network.

    Iffy connectors etc. are discovered pretty quickly, but it is still be possible for the system to fail if, for example, a switch on the 'red' network failed and a box had a dodgy NIC on the 'green' network -- this kind of thing happens when somebody chooses to ignore an alarm because "it's working fine, we'll get round to replacing that card next week"

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...