Forgot your password?
typodupeerror
Bug Networking Transportation

Dublin Air Traffic Control Brought Down By Faulty NIC 203

Posted by timothy
from the can-go-wrong-can-go-wrong-nothing-can dept.
Not so very long ago after passengers were left hanging by a similar glitch at LAX, Gilby4mPuck writes with another story of NIC failure leading to a disruption of air traffic, this time in Ireland, excerpting: "Data showing the location, height and speed of approaching planes disappeared from screens for 10 minutes each time. ... Thales ATM stated that in 10 similar air traffic control Centres worldwide with over 500,000 flight hours (50 years), this is the first time an incident of this type has been reported. ... '[They] confirmed the root cause of the hardware system malfunction as an intermittent malfunctioning network card which consequently overcame the built-in system redundancy,' said an IAA spokeswoman."
This discussion has been archived. No new comments can be posted.

Dublin Air Traffic Control Brought Down By Faulty NIC

Comments Filter:
  • by anomnomnomymous (1321267) on Friday July 18, 2008 @03:58AM (#24238955)
    Put all those NIC's on the terror watchlist!
    • by eclectro (227083)

      Put all those NIC's on the terror watchlist!

      Why would anyone listen to you? Somebody who was just put on the terror watchlist by a bad NIC.

  • More scary stories. (Score:2, Interesting)

    by rixster_uk (1216414)
    People - I am trying to collect airport related scary stories. I haven't got many yet but if you have some then please let me know - you can email me at admin@scareports.com or just visit the site (blatant pimping) here [scareports.com] .
  • I'd have to have some sympathy that it was an intermittent problem. They can really cause confusion to automated systems that are designed to cope with hard failures. I've had many occasions in my latter career in Service Delivery and support where it's taken human conviction to sort out issues caused by the cluster software trying to cope with intermittent connections
    • Ten minutes at a time? That doesn't sound like a "mostly broken" problem to me, that sounds like a 10 minute fail-over time. Shit happens, but if it takes you 10 minutes for your stuff to automatically start working again you're doing it wrong, especially since its all int one data center. And whatever hapened to redundant off-site systems? New law: As a conversation progresses, the chance of someone saying "terrorist" approaches 100%
      • Re:ten minutes (Score:5, Informative)

        by wintermute000 (928348) <bender.planetexpress@com@au> on Friday July 18, 2008 @04:49AM (#24239249)

        there are plenty of examples of 10 minute failover

        Older cisco ATAs take 10 minutes to swing onto SRST if keepalives are lost to the callmanager cluster.

        a complex routing protocol refresh (big BGP networks) can take many minutes

        a faulty NIC can easily bring down a LAN segment, with or without redundant switching paths - and it makes it look like a router failure as the router overloads trying to deal with the broadcast storm

  • NICtzche (Score:4, Funny)

    by cornjchob (514035) <thisiswherejunkgoes@gmail.com> on Friday July 18, 2008 @04:27AM (#24239125)

    if this piece of hardware was capable of "overc[oming] the built-in system redundancy", perhaps its ilk ought to be patrolling the transistorized wunderplatz of interconnected morsels governing our most hubris means of transportation? I, for one, would certainly feel safer.

  • by LM741N (258038) on Friday July 18, 2008 @04:35AM (#24239155)

    When I was administering a small network in Marin, every time we had a small earthquake, all of the AppleTalk connectors would come loose. Took hours to find the faults and push them together. I guess we should have used duct tape.

    I suppose at an airport as each jet came in creating vibrations, those same connectors would have dislodged.

  • by Farmer Tim (530755) <roundfile.mindless@com> on Friday July 18, 2008 @04:54AM (#24239281) Journal

    "...an intermittent malfunctioning network card which consequently overcame the built-in system redundancy"

    But it's one of the lucky ones.

    Every year, thousands of NICs fall victim to built-in system redundancy; if you know a card whose activity indicators are darkened and lifeless, it may have a redundancy problem. With your support and donations, we at Ethernetics Anonymous can help more network cards beat the scourge of built-in system redundancy, and make them feel like a useful part of society again.

  • in this case would be the ability to run air traffic control without all those fancy computrons, should the need arise.
    • Re: (Score:2, Insightful)

      Unfortunately, this NIC's fault showed up as the radar not working. What were they supposed to fail-over to? Binoculars?

      • Re: (Score:3, Interesting)

        by Bromskloss (750445)

        Unfortunately, this NIC's fault showed up as the radar not working. What were they supposed to fail-over to? Binoculars?

        I suppose so, if it's possible to do it that way. Also, have the planes do the old-fashioned "circle the airport and keep an eye out for other traffic" if that works with big, heavy planes. It sure gives you (the pilot) a nice sense of being a free and sovereign person anyway, like on small airfields. :-)

        • (This is coming from a future air traffic controller)

          You're forgetting a few things.

          1. Not all air traffic control is done from airport towers. There are also TRACONs and ARTCCs, which is the type of facility you see in the movie Pushing Tin. Basically big dark buildings filled with radar screens and strung out people completely messed up on caffeine.

          2. "circle the airport and watch for traffic" doesn't work for airplanes at FL350 doing 500+ knots. Usually that's IFR traffic, so the planes would have no cha

          • Quick correction on an error. I said that anything over FL350 is 'usually' IFR traffic. What I meant to say was IFR conditions. Everything over 18,000ft is always IFR traffic.
      • by clickety6 (141178)

        [quote]What were they supposed to fail-over to? Binoculars?[/quote]

        And a giant relief model of the airport with young ladies pushing around little model aircraft with billiard cues. And a big glass panel with people marking up aircraft positions with wax crayons.

        • by zmollusc (763634)

          That is the stupidest plan ever, it is snooker cue _rests_ with which the ladies push the little model aircraft around.

  • by davew (820) on Friday July 18, 2008 @04:59AM (#24239301) Homepage Journal

    I was due to fly the evening it all went wrong. Here's a lesson: if you're standing in a three-hour queue for the Ryanair desk, and they tell people to rebook on the web, and you take out a laptop and 3G modem, be prepared for a stampede.

    • by caluml (551744)
      You could make a pretty penny with that. £5 a shot, or whatever. Plus your keystroke logger would have tonnes of valid credit card information. :)
  • by gweihir (88907) on Friday July 18, 2008 @05:01AM (#24239319)

    If they have good redundancy, they have two separate networks and two independent, preferrably different network cards, in all systems. Then they would do fail-over. Seems to me that if one card can bring this down, then the people that designed the redundancy screwed up badly.

    • by jacquesm (154384)

      second that... sorry, I missed your post before I wrote mine. Whoever built the system goofed, and to screw up with flight control systems at this level should be grounds for termination and never ever to get work in mission critical systems again. There really is no room for error in systems like this.

      I've worked a bit in the aerospace industry, specifically on software that would estimate the amount of fuel required for a flight taking into account alternative landing areas, winds and so on.

      The amount of

      • by gweihir (88907)

        Indeed. The open souce angle is also critical for fast and conclusive accident investigation.

        Come to think of it, I have never worked on a really critical system, but I am in IT security, which shares the thinking about ways to break a system. One difference is that our "malfunctions" are intelligent and malicious. On the other hand, they typically cannot kill large numbers of people. I think I prefer that. Having software out there that can kill, would probably give me bad dreams....

    • ...it is not so black and white.

      I administer a network like that. Pharmaceutical plant to be precise.
      All machines on the production network have 2 independent PCI nics, connecting to 2 identical but separate networks, using separate routers and switches. The critical servers are stratus high availability servers which have dual redundant everything, driving all components in lock-step and correcting errors on the fly.

      If something happens to cause a network switch over, there is a bulk of network traffic to

      • by gweihir (88907)

        If something happens to cause a network switch over, there is a bulk of network traffic to deal with it, because sockets have to be opened and closed, state has to be transferred, system control message flow has to be restarted so that all controllers go back to the normal state, ... And at application level, everything is RPC and DCOM based, so this will cause a significant disruption for the running services, since COM objects and RPC marshalling have to be destroyed and recreated, reinitialized, ...

        That

  • Why!? (Score:5, Funny)

    by damburger (981828) on Friday July 18, 2008 @05:21AM (#24239443)

    I am flying to Florida tomorrow, it will only be my fifth plane flight in total and my first transatlantic flight. Despite being a rational scientist, who knows how safe it is statistically, I am having trouble suppressing my anxiety.

    And at this point, fate sees fit to bombard me with horror stories about flying. This news about air traffic control comes on the heels of a headline I just saw on the front page of the Independent about pilots not reporting faults on aircraft and thus unsafe ones still flying about. I can't remember the exact wording because my brain parsed it as "TOMORROW YOU WILL DIE IN FLAMES"

    • Re:Why!? (Score:4, Funny)

      by FrostedWheat (172733) on Friday July 18, 2008 @05:51AM (#24239593)
      A long time ago I went on a school trip to London, and it was the first time I had ever been on a plane so I was a bit nervous. In the airport shop there was a magazine (can't remember which now) with a plane in flames on the front cover, with the large headline "Why Planes Crash". Whoever put them out must have had an evil streak too, they had spread them out to fill the entire top shelf.
      • by damburger (981828)

        Damn thats cold

        Your signature, however, gives me something else to focus on. Fucking software patents! Idiotic corporate pandering EU! Grrrrrr! I'm not afraid of flying, I'm angry about IP abuse!

    • I can't remember the exact wording because my brain parsed it as "TOMORROW YOU WILL DIE IN FLAMES"

      Just to reassure you, you are far more likely to die from the impact, or failing that smoke inhalation, or failing that drowning. The passenger compartment is rather well insulated from flame.

    • Re: (Score:3, Funny)

      --EXT: PLANE FLYING OVERHEAD
      --INT: PLANE COCKPIT
      PILOT #1: Oh wow, I really hope we don't have a crash.
      PILOT #2: Me too.
      PILOT #1: But they say it's safer than crossing the road!
      PILOT #2: Yes, but we have to do that too.
      PILOT #1: Best not to think about it.
    • "TOMORROW YOU WILL DIE IN FLAMES"

      its not the flames that kill you, its the long long LONG fall.

      but don't think of it as an end; think of it as a really effective way to cut down on your living expenses.

  • by Fri13 (963421)

    "[They] confirmed the root cause of the hardware system malfunction as an intermittent malfunctioning network card which consequently overcame the built-in system redundancy,' said an IAA spokeswoman."

    And when we edit littlebit, can we have the truth?:

    They confirmed the root caused the hardware system malfunction using an intermittent malfunctioning network card wich consequently overcame the build-in system redundancy.

  • by ddrichardson (869910) on Friday July 18, 2008 @05:43AM (#24239565) Homepage

    I work in aviation and wonder if the terminology being used by the newspaper articles is correct.

    It appears to be talking about mode S IFF (Interrogation Friend or Foe) or SIFF radar systems which identify aircraft and appends height data. The speed is the only thing that needs calculating, as it isn't encoded in the pulse train.

    Why this is weird is because much older bus technologies are normally used to handle this data being transferred than current network technology, such as MIL-STD-1553 [wikipedia.org].

    This makes me wonder if it was one of two things - a system inputing to an ethernet PC system that calculates and displays the information or more likely they are talking about a DLTU type stub connector (or remote terminal) used in such typical buses. This is unlikely because the bus systems they are employed on, the bus controller would have picked up on the failure during continuous built in test and pulled in an alternative.

    If its the former then someone needs shooting. ATC is a realtime application and the overhead involved here would be unacceptable. I'm not even sure of the benefit of a network, multiple self contained indiviual terminals would be safer.

    • by ledow (319597)

      A quick google turns up:

      http://en.wikipedia.org/wiki/Avionics_Full-Duplex_Switched_Ethernet [wikipedia.org]

      Which suggests that Ethernet-derived products are, indeed, used in critical systems (although this seems to be on-aircraft rather than in ATC). It (apparently) has seen wide deployment on common "famous" aircraft.

      And the UK has been "upgrading" its air traffic control for years and years - so much so that they now appear to be nothing more than an office with some multi-head display if the footage shown on news-repor

      • Re: (Score:3, Interesting)

        by ddrichardson (869910)

        While you're right, the key phrase from the article you give is:

        ARINC 664 Specification which defines how Commercial Off-the-Shelf networking components will be used for future generation Aircraft Data Networks (ADN).

        Specifically, this standard is aimed at use on aircraft not in ATC, in fact because of the weight reduction it offers.

        Also not to split hairs but Dublin is not in the UK, this seems trite but is valid as there are different agencies involved. More over, the appropriation of new technologies is

  • by PinkyDead (862370) on Friday July 18, 2008 @06:34AM (#24239759) Journal

    Everyone in Ireland knows that the Irish Examiner used to be the Cork examiner - and they never miss an opportunity to point out how Dublin is doing a bad job.

    This is because Cork thinks that it's the centre of the friggin' universe. The 'Real Capital', my arse! Just a bunch of thunderin' ejits, living in their little Blarney fantasy land. Sure they can't even talk right. What the hell is a 'langer', anyway. They wouldn't even know how to spell NIC.

    The fact that they are right is quite beside the point.

    (For a North American cultural equivalent, please see http://en.wikipedia.org/wiki/South_Park:_Bigger%2C_Longer_%26_Uncut [wikipedia.org])

    Anyone who mods me down is from Cork - believe it!

  • It's a cover-up for Zing Zang Zoom [technet.com] rolling out a rootkit protection

  • by rbanffy (584143) on Friday July 18, 2008 @08:39AM (#24240521) Homepage Journal

    What is a "contol" and why is this so important?

  • Sounds like they spilled Guinness on the servers again! Either that, or it's those damned pesky Leprechauns! :-)

If it's not in the computer, it doesn't exist.

Working...