Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Networking Bug

One Failed NIC Strands 20,000 At LAX 293

The card in question experienced a partial failure that started about 12:50 p.m. Saturday, said Jennifer Connors, a chief in the office of field operations for the Customs and Border Protection agency. As data overloaded the system, a domino effect occurred with other computer network cards, eventually causing a total system failure. A spokeswoman for the airports agency said airport and customs officials are discussing how to handle a similar incident should it occur in the future.
This discussion has been archived. No new comments can be posted.

One Failed NIC Strands 20,000 At LAX

Comments Filter:
  • Though I heard it was a switch. Same idea though- all it takes is one malfunctioning card flooding the LAN with bad packets to bring it all down.
    • Re: (Score:3, Interesting)

      Then that would lead me to think "hub", not switch. Or just a really shitty switch...
      • Re: (Score:3, Insightful)

        by COMON$ ( 806135 ) *
        apparently you are not familliar with what a bad nic does to even the best of switches.
      • by KillerCow ( 213458 ) on Wednesday August 15, 2007 @04:15PM (#20240937)
        I am not a networks guy... but it's my understanding that a switch acts like a hub when it sees a TO: MAC address that it doesn't know what port it's on. They learn the switching structure of a network by watching the FROM fields on the datagrams. When the switch powers up, it behaves exactly like a hub and just watches/learns what MAC addresses are on which ports and builds a switching table. If it starts getting garbage packets, it will look at the TO field and say "I don't know what port this should go out on, so I have to send it on all of them." So garbage packets would overwhelm a network even if it was switched.

        It would take a router to stop this from happening. I don't think that there are many networks that use routers for internal partitioning. Even then, that entire network behind that router would be flooded.
        • by camperdave ( 969942 ) on Wednesday August 15, 2007 @04:57PM (#20241375) Journal
          You're right to a point. An ethernet frame, along with the source and destination addresses, has a checksum. A switch that is using a store and forward procedure is supposed to drop the frame if the checksum is invalid. If the nic was throwing garbled frames onto the network, it would have to be garbled in such a way as to have a valid checksum (assuming they are using store and forward switches in the first place).
        • by Vengance Daemon ( 946173 ) on Wednesday August 15, 2007 @05:41PM (#20241887)
          Why are you assuming that this is an Ethernet network? As old as the equipment they are using is, it may be a Token Ring network - the symptoms that were described sound just like a "beaconing" token ring network.
          • That is the first thing I thought, "I bet they are still using Token Ring." Man, when a Token Ring card went bad, it was hell on the network, nothing worked because the token would not get passed properly.
            • Man, when a Token Ring card went bad, it was hell on the network, nothing worked because the token would not get passed properly.

              The worst thing is when a user decides to unplug the cable to move something or whatever. Then the token can fall out and you have to spend hours on your hands and knees with a magnifying glass trying to find the damn thing!

              Its true! I saw it in a Dilbert cartoon!
        • I don't think that there are many networks that use routers for internal partitioning.

          We have *tons* of routers that separate the various subnets (which map 1-to-1 to VLANs) on our internal network. How do you get from one "broadcast domain" to another? Via the default gateway on your own subnet. That is passing through a router. (It may not be a physical box called a 'router' but the packets are still being routed).

    • Sure, if you're buying consumer grade switching hardware, and you have only one subnet, or all your subnets are weirdly bridged or whatever.

      For my money, this should never have happened from a problem with one machine. That's wholly unacceptable. My home network is robust enough to handle one bad machine without going down completely...Hell, I could lose a whole subnet and no one on the other subnet would notice a thing.

      If this system or switch or whatever is critical, there should have been a fail over. Th
  • by SatanicPuppy ( 611928 ) * <Satanicpuppy@g[ ]l.com ['mai' in gap]> on Wednesday August 15, 2007 @03:58PM (#20240711) Journal
    According to the effing article, it wasn't even a server, but a goddamn desktop. How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps, especially through a hardware failure...A compromised system might be able to do it, but a system just going dark?

    For that to have had any effect at all, that system must have been the lynchpin for a critical piece of the network...probably some Homeland security abortion tacked on to the network, or some such crap...This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.

    This is like that. Single point of failure in the worst possible way. Gross incompetence, shortsightedness, and general disregard for things like "uptime"; pretty much what we've come to expect from the airline industry these days. If I'm not flying myself, I'm going to be driving, sailing, or riding a goddamn bicycle before I fly commercial.
    • by Jeremiah Cornelius ( 137 ) * on Wednesday August 15, 2007 @04:01PM (#20240755) Homepage Journal
      Well.

      Token ring sure used to fail like this! 1 bad station sending 10,000 ring-purge messages a second? Still, it was a truck. Files under 1Mb could be transferred, and this was TR/4, not 16!
      • Re: (Score:2, Informative)

        by sigipickl ( 595932 )
        This totally sounds like a token ring problem.... Either network flooding or dropped packets (tokens). These issues used to be a bear to track down- going from machine to machine in serial from the MAU...

        Ethernet and switching has made me fat- I never have to leave my desk to troubleshoot.
        • by hedley ( 8715 )
          Back in the late 80's we had a network of Apollo DN300's. Days of lost time when the ring went down. Once and entire day lost. IT scrambling about in the ceiling tiles trying to TDR the cable, finally, a NIC out in a workstation in a closed office, the employee was on vacation. Whilst on vacation that NIC, like a bad xmas tree bulb took down the whole lan whilst it wouldn't play nice and pass on the token. Total junk and a total waste of time. I used to dream of taking that Apollo from CA to Chemlsford mas
        • Re: (Score:3, Funny)

          But Token Rings are, like, obsolete and stuff, surely there wouldn't be something that obsolete in a place like an airport, right?

          Right?

          [crickets chirping]

          Right?
      • by Jaxoreth ( 208176 ) on Wednesday August 15, 2007 @05:53PM (#20242009)

        Still, it was a truck.
        Which explains why it's not used in the Internet.
    • Re: (Score:3, Interesting)

      by mhall119 ( 1035984 )

      A compromised system might be able to do it, but a system just going dark?
      The article says it was a partial failure, so I'm guessing the NIC didn't "go dark", instead it started flooding the network with bad packets.
    • by MightyMartian ( 840721 ) on Wednesday August 15, 2007 @04:07PM (#20240821) Journal
      If the NIC starts broadcasting like nuts, it will overwhelm everything on the segment. If you have a flat network topology, then kla-boom, everything goes down the shits. A semi-decent switch ought to deal with a broadcast storm. The best way to deal with it is to split your network up, thus rendering the scope of such an incident significantly smaller.
      • Yup. I've never really seen a situation where you'd have more than a dozen or so computers on a crappy layer 1 switch. Higher quality hardware would throttle this stuff down to the very most local layer, unless you're specifically multicasting across the whole network, which is a security horror story.
      • And even a marginally competent network administrator ought to be able to recognize that they face a packet storm and isolate the problem in about 30 minutes through the simple expedient of, "Unplug this switch. Did the problem stop? No. Plug it back in and unplug the next switch."

        I'll bet the sucker not only keeps his job but gets a commendation for finding the problem.
      • Re: (Score:3, Funny)

        by Solandri ( 704621 )
        Yeah, I had that happen at a small business I consulted for. Their flat LAN died. I eventually tracked the problem down to a cheap unmanaged switch which had a network cable plugged into it for people to plug their laptops into. Whoever used it last thought leaving the unplugged cable laying on the desk looked untidy, so they "helpfully" plugged it into an empty socket on the same switch.
    • Re: (Score:3, Interesting)

      by Billosaur ( 927319 ) *

      And beyond that... how come there is no redundancy? After 9/11, every IT organization on the planet began making sure there was some form or fail-over to a backup system or disaster recovery site to ensure that critical systems could not go down as the result of something similar or some other large-scale disaster. Not only was this system cobbled together apparently, there was no regard for the possibility of it failing for any reason.

      • Re: (Score:3, Insightful)

        by dave562 ( 969951 )
        They concentrated all of the redundancy dollars into layer B of the OSI model... the bureaucracy. There wasn't anything left for the lower layers.
      • Redundancy only helps when you have a system that stops working, not one that malfunctions:

        For instance, imagine a RAID 1 in which the data is becoming corrupted. Having redundancy doesn't help: you just have two copies of a corrupted file.

        In this instance, a network card started spewing out crap. Because it could fill it's pipe, and most of the packets were rebroadcast down most of the other cables, they also filled those cables.
    • I told the boss we should get a proper network connection. But noooooo, he insisted that getting a consumer-level DSL connection and using Windows Internet Connection Sharing was the way to go...
    • by jafiwam ( 310805 )

      According to the effing article, it wasn't even a server, but a goddamn desktop. How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps, especially through a hardware failure...A compromised system might be able to do it, but a system just going dark?

      Los Angeles World Airports is a unique system of four airports owned and operated by the City of Los Angeles.

      Any further questions?

      Probably the lowest bidder union labor designing and setting it up. Shoulda called IBM.

    • How in the holy hell does a desktop take down the whole system? I can't even conceive of a situation where that could be the case on anything other than a network designed by chimps

      Heh, I think you're starting to get the sensation I had when one tiny error in GRUB locked me out of my computer entirely, to the point where even having the Ubuntu Install CD couldn't gain me any access to any OS whatsoever.

      Geez, what kind of chimp would allow such a damaging failure to occur along such a vital path, right?
      • No, not really. You've got to expect to lose machines; failure happens. Could have been a motherboard or a power supply. I'd still expect you to be able to boot from CD though. You try knoppix? You should be able to boot to knoppix, then mount the /boot partition and have your way with grub.

        The thing is, a network topology is wildly different from a computer. It should be designed for parts of it to drop off, and parts to go berserk...These things happen all the time. It should be designed with a minimum of
        • No, not really. You've got to expect to lose machines; failure happens. Could have been a motherboard or a power supply.

          Yeah, then I could go to the local store, buy a part, replace it, and move on with my life, like I've done before (a power supply did in fact go out on me before).

          I'd still expect you to be able to boot from CD though. You try knoppix? You should be able to boot to knoppix, then mount the /boot partition and have your way with grub.

          Hey! Capital idea! I'd download it on that computer I c
    • [Quote]This is like the time I traced a network meltdown to a 4 port hub (not a switch, and unmanaged hub) that was plugged into (not a joke) a T-3 concentrator on one port, and and three subnets of around 200 computers each on the other 3 ports. Every single one of the outbound cables from the $15.00 hub terminated in a piece of networking infrastructure costing not less than $10,000 dollars.[/Quote]

      LOL. Perhaps they ran out of funding after buying all of the rest of the hardware? :)
  • In other news... (Score:3, Insightful)

    by djupedal ( 584558 ) on Wednesday August 15, 2007 @04:01PM (#20240757)
    "...said airport and customs officials are discussing how to handle a similar incident should it occur in the future."

    What makes them think they'll get another shot? Rank and file voters are ready with their own plan...should a 'similar incident' by the same fools happen again.
    • by Animats ( 122034 ) on Wednesday August 15, 2007 @04:08PM (#20240831) Homepage

      DHS's idea of a "backup plan" will probably be to build a huge fenced area into which to dump arriving passengers when their systems are down.

      • :)

        I hear EMA has several new/used camp trailers I'm sure DHS could avail themselves of.
      • Arrest all NIC designers, engineers, network stack developers, IT managers,... on suspicion of conspiring to cause the problem.

        Change to Wifi because that can't have NIC faults.

        C'mon folk... help me out here!

        • Arrest all NIC designers, engineers, network stack developers, IT managers,... on suspicion of conspiring to cause the problem. Change to Wifi because that can't have NIC faults. C'mon folk... help me out here!

          Print each package to be sent over the network, use the USPS first class mail to send it to the right destination on time, and hire a bunch of undocumented immigrants to enter the data again.

          I'm sure they already have a nice database to use to find prospects that could do the data entry!

    • by fm6 ( 162816 )

      What makes them think they'll get another shot?
      You mean, besides the fact that DHS still has the same inept upper management they had during Katrina? And the fact that voters won't have any say in the matter until November 2008?
  • You figure it out (Score:4, Interesting)

    by COMON$ ( 806135 ) * on Wednesday August 15, 2007 @04:01PM (#20240761) Journal
    Let me know, knowing how to prevent failure to to a flaky nic on a network is a very large issue.

    First you see latency on a network, then you fire up a sniffer and hope to god you can get enough packets to deduce which is the flaky card without shutting down every NIC on your network.

    Of course I did write a paper on this behavior years ago in my CS networking class. Taking a Snort box and a series of custom scripts to notify admins with spikes on the network outside of normal operating ranges for that device's history. However implementing this successfully in an elegant fashion has been beyond me and I just rely on Nagios to do a lot of my bidding.

    • Re:You figure it out (Score:5, Informative)

      by GreggBz ( 777373 ) on Wednesday August 15, 2007 @04:21PM (#20241023) Homepage
      One not to unreasonable strategy is to set up SNMP traps on all your NICs. This is not unlike the cable modem watching software at most Cable ISPs.

      At first, I can envision it being a PITA if you have a variety of NIC hardware especially finding all those MIBs. But they are all pretty standard these days, and your polling interval could be fairly long, like every 2 minutes. You could script the results, sorting all the naughties and periodic non-responders to the top of the list. That would narrow things down a heck of a lot in a circumstance like this.

      No alarms, but at least a quick heartbeat of your (conceivably very large) network. A similar system can be used to watch 30,000+ cable modems, without to much load on the snmp trap server.
      • Re:You figure it out (Score:5, Informative)

        by ctr2sprt ( 574731 ) on Wednesday August 15, 2007 @06:32PM (#20242413)

        One not to unreasonable strategy is to set up SNMP traps on all your NICs.

        That doesn't make much sense. If the NIC goes down or starts misbehaving, the chances of your NIC's SNMP traps arriving at their destination is effectively zero. You probably mean setting up traps on your switches with threshold traps on all the interfaces, the switch's CPU, CAM table size, etc. Which would be more useful. You could also use a syslog server, which is going to be considerably easier if you don't have a dedicated monitoring solution.

        But they are all pretty standard these days, and your polling interval could be fairly long, like every 2 minutes.

        You're not thinking of traps if you're talking about polling. Traps are initiated by the switch (or other device) and sent to your log monster. You can use SNMP polling of the sort that e.g. MRTG and OpenNMS do which, with appropriate thresholds, can get you most of the same benefits. But don't use it on Cisco hardware, not if you want your network to function, anyway. Their CPUs can't handle SNMP polling, not at the level you're talking about.

        No alarms, but at least a quick heartbeat of your (conceivably very large) network. A similar system can be used to watch 30,000+ cable modems, without to much load on the snmp trap server.

        I think you are underestimating exactly how much SNMP trap spam network devices send. You'll get a trap for the ambient temperature being too high. You'll get a trap if you send more than X frames per second ("threshold fired"), and another trap two seconds later when it drops below Y fps ("threshold rearmed"). You'll get at least four link traps whenever a box reboots (down for the reboot, up/down during POST, up when the OS boots; probably another up/down as the OS negotiates link speed and duplex), plus an STP-related trap for each link state change ("port 2/21 is FORWARDING"). You'll get traps when CDP randomly finds, or loses, some device somewhere on the network. You'll get an army of traps whenever you create, delete, or change a vlan. If you've got a layer 7 switch that does health checks, you'll get about ten traps every time one of your HA webservers takes more than 100ms to serve its test page, which happens about once per server per minute even when nothing is wrong.

        And the best part is that because SNMP traps are UDP, they are the first thing to get thrown away when the shit hits the fan. So when a failing NIC starts jabbering and the poor switch's CPU goes to 100%, you'll never see a trap. All you'll see are a bunch of boxes on the same vlan going up and down for no apparent reason. You might get a fps threshold trap from some gear on your distribution or core layers, assuming it's sufficiently beefy to handle a panicked switch screaming ARPs at a gig a second and have some brains left over, but that's about it. More likely you won't have a clue that anything is wrong until the switch kicks and 40 boxes go down for five minutes.

        Monitoring a network with tens of thousands of switch ports sucks hardcore, there's no way around it.

    • It's called teaming on windows and we use it. In fact, we had a flaky NIC just the other day. I'm not sure how many cards/vendors support teaming outside of HPaq.

      On linux, it's called bonding. This is a killer feature.

      I had some very limited professional experience with LAWA in the last couple of years. (LAWA runs LAX) I have no doubt there is quite a bit of consultant the usual chicanery going on whereby they don't actually hire qualified IT people, just people an elected official or two or three may k
    • How about doing regular police work instead of pre crime, so that passengers don't have to stand around while your network flakes out?
    • The AC is right. Your network topology should be spread out over a number of subnets, and they should only talk to each other where it's critical. The subnets should be separated by expensive managed switches, or by custom hardware configured to monitor packet traffic and isolate problems. Critical systems should be largely inaccessible to the vast majority of the network, and where they are accessible the access is monitored and throttled. If one machine takes too much traffic, you need a second machine se
      • by COMON$ ( 806135 ) *
        You are correct and you speak far truer than the AC. Basic VLANs should be a part of any admin's topology, I worked on a network once where all 100 servers were connected to the same VLAN as an additional 120PCs subnet was just about full and they were asking for the problem above. I agree with you wholeheartedly, but my question is more related to local subnets 100 PCs here 100PCs there and I would like to detect a faulty nic before it disturbs the rest of the PCs.
    • by t0rkm3 ( 666910 )
      You can see a similar behavior from Cisco IPS if you enable and tune the anomaly detector engine. This in turn feeds MARS... which is groovy except the alerting stinks within MARS. So you have to beat up Cisco and they'll hash out a xslt that will prettify the XML garbage into a nice little HTML email for the desktop support guys to chase down the offender. Couple that with some Perl to grab the fields and shove them in a DB for easy reference...

      It works, and it works a lot more easily than anything else th
  • by Glasswire ( 302197 ) on Wednesday August 15, 2007 @04:05PM (#20240797) Homepage
    ...for not firing the networking manager. The fact that they were NOT terrified that this news would get out and were too stupid to cover it up indicates he/she and their subordinates SIMPLY DON'T KNOW THEY DID ANYTHING WRONG by not putting in a sufficently montiored switch architecture which would rapidly alert IT staff and lock out the offending node.
    Simply amazing. Will someone in the press publish the names of these losers so they can be blacklisted?
    • by Rob T Firefly ( 844560 ) on Wednesday August 15, 2007 @04:17PM (#20240959) Homepage Journal
      They have to find someone who can not only design a vital high-traffic network and maintain it... but who didn't have fish for dinner.
    • Re: (Score:3, Informative)

      by kschendel ( 644489 )
      RTFA. This was a *Customs* system. Not LAX, not airlines. The only blame that the airlines can (and should) get for this is not shining the big light on Customs and Border Patrol from the very start. I think it's time that the airlines started putting public and private pressure on CBP and TSA to get the hell out of the way. It's not as if they are actually securing anything.

      CBP deserves a punch in the nose for not having a proper network design with redundancy; and another punch in the nose for not h
    • That's the easy way out but probably not the best one. Often, is the case in government, the "officials" think they're always right on areas outside their expertise. The admin probably knows what he's doing but is having to convince people who don't know anything about anything to change their policies so that he can make the network work correctly. They don't see anything wrong with the way its setup - it was working well *before* the failure right? So, no need to change!
    • Yes, let's fire people because they don't cover up their mistakes. Then we'll have no one working in any position of importance but liars. Brilliant!

      IF (and that's a big if) we accept your logic that the lack of a cover-up means the IT head/network admin (your post isn't terribly clear on this point) didn't realize there was anything wrong with the way things were being done, then yes, I suppose that person should be fired. However, I call that logic "bullshit". Maybe I'm too damn optimistic, but I'd pref

  • by dy2t ( 549292 )
    Also known as IEEE 802.3ad supports aggregating NICs to both improve overall bandwidth as well as gracefully deal with failed links.
    More info at http://en.wikipedia.org/wiki/Link_Aggregation_Cont rol_Protocol [wikipedia.org]

    Systems seem to be more commonly shipping with multiple NICs (esp. servers) so maybe this will be used more and more. It is important to note that the network switch/router needs to be able to support LACP (dumb/cheap switches do not while expensive/managed ones do) so that might be a barrier. Cisco s
  • by urlgrey ( 798089 ) * on Wednesday August 15, 2007 @04:22PM (#20241043) Homepage
    To all you novice net admins out there: network cards do *not* like chunky peanut butter! Smooth/creamy only, please.

    Now you see what happens when some joker thinks [s]he can get away with using chunky for something as critical as proper care and feeding of network cards. Pfft.

    Bah! Kids these days... I tell ya. Probably the same folks that think the interwebnet is the same as the World Wide Web.

    Great, Scott! What's next?!

  • by Potent ( 47920 ) on Wednesday August 15, 2007 @04:23PM (#20241061) Homepage
    When the U.S. Government is letting millions of illegal aliens cross over from Mexico and live here with impunity, then what the fuck is the point with stopping a few thousand document carrying people getting off of planes from entering the country?

    I guess the system exists to give the appearance that the feds actually give a shit.

    And then the Pres and Congress wonder why their approval ratings are as small as their shoe sizes...
  • by KDN ( 3283 ) on Wednesday August 15, 2007 @04:24PM (#20241071)
    Years ago we had a 10BT nic go defective so that whenever the nic was plugged into the switch it would obliterate traffic on that segment. The fun part: EVEN IF THE NIC WAS NOT PLUGGED INTO THE PC. Luckily that happened in one of the few areas that had switches at the time, everything else was one huge flat lan.
  • by The One and Only ( 691315 ) * <[ten.hclewlihp] [ta] [lihp]> on Wednesday August 15, 2007 @04:30PM (#20241135) Homepage

    A spokeswoman for the airports agency, said airport and customs officials are discussing how to handle a similar incident should it occur in the future.

    Except in the future, the incident isn't going to be similar, aside from being similarly boneheaded. This attitude of "only defend yourself from things that have already happened to you before" is just plain dumb. Obviously their system was set up and administered by a boneheaded organization to begin with, and now that same boneheaded organization is rushing to convene a committee to discuss a committee to discuss how to prevent something that already happened from happening again. The root flaw is still in the organization.

    • Isn't that basically what this whole thing is about, anyway? We sniff your shoes because we had a shoe-bomber. We check liquids because we heard someone might try that. We put cops on planes and tighten security on the cockpit doors because someone took advantage of that hole. We check a very few cargo containers coming into ports because we heard that was an idea thrown around. We suddenly learn to block physical access to the parking lots of FBI buildings because oddly enough, someone made use of that gap
    • by geekoid ( 135745 )
      So you are saying that a NIC will never go down again in the LAX system? That's quite a bold statement.
  • The NIC that failed isn't the part that's at fault. NICs fail, and can be counted on to do so inevitably, if relatively unpredictably (MTBF is statistical).

    The real problem NIC is the one that wasn't there as backup. Either a redundant one already online, or a hotswap one for a brief downtime, or just a spare that could be replaced after a quick diagnostic according to the system's exception handling runbook of emergency procedures.

    Of course, we can't blame a NIC that doesn't exist, even if we're blaming it
  • Where I work, if there's a packet storm someplace (server is getting attacked, server is attacker, or someone just has a really phat pipe on the other end and is moving a ton of data) we get a SNMP TRAP for packet threshold on the offending port. BAM! You know where the problem is, and since we have managed switches you just shut off the port if you can't resolve the problem.

    Having said that, since the managed switches are gigE uplinked and each port is only 10/100, I don't think we've ever had a problem wh
  • I used to work for a very large travel agency and have seen queues of travel resevations get pretty backed up and cause problems before although on a smaller scale.

    Most reservations are checked for problems automatically but pushed through by a person and moved from one queue to another. If the program that checks them crashes, it can back things up.

    I remember a program crashing and a queue getting 2000+ reservations in it before someone figured out what was going on and it had things screwed up for abo
  • This brings out an obvious point, despite the advances we have made in computing and IT, it is still relatively young and not that robust.

    This is the equivalent of your car stops working and the 'check engine' light does not even come on. At least now some of the technology for cars is getting to the point that it will find the problem for you. The same still cannot be said for large computer networks.

    When people stop treating computers as flawless wonder machines, then we shall see some real progress

    • Ethernet is hardly a new technology. Anybody with an ounce of knowledge should know about broadcast storms, and should know the standard techniques for isolating them (rebooting router/switch/hub is an awfully good start, and then from there taking individual NICs physically off the segment). There are tools that can help, but even if you're stuck with antiquated hubs and the like, there are still ways of dealing with this.

      Where I come from is over a decade of hard-won experience dealing with network issu
  • by bwy ( 726112 ) on Wednesday August 15, 2007 @05:17PM (#20241621)
    Sadly, many real-world systems are often nothing like what people might envision as them as. We all sit back in our chairs reading slashdot and thinking everything is masterfully architected, fully HA, redundant, etc.

    Then as you work more places you start seeing that this is pretty far from actual truth. Many "production" systems are held together by rubber bands, and duct tape if you're lucky (but not even the good kind.) In my experience it can be a combination of poor funding, poor priorities, technical management that doesn't understand technology, or just a lack of experience or skills among the workers.

    Not every place is a Google or Yahoo!, that I can imagine look and smell like technology wherever you go on their fancy campuses. Most organizations are businesses first, and tech shops last. If software and hardware appears to "work", it is hard to convince anybody in a typical business that anything should change- even if what is "working" is a one-off prototype running on desktop hardware. It often requires strong technical management and a good CIO/CTO to make sure that things happen like they should.

    I suspect that a lot of things that we consider "critical" in our society are a hell of a lot less robust under then hood than anything Google is running.
  • Are they saying that one bad card destroyed other cards? That seems a bit unusual.
  • Like most of the TSA and the DHS poor design abetted with high levels of secrecy and politicization results in a fragile system. With a large number of single points of failure and a lack of fall-backs and fail-safes means that one problem hoses the system. One wonders what happens when things really go pear shaped in an unanticipated fashion.
  • by LeRandy ( 937290 ) on Wednesday August 15, 2007 @06:53PM (#20242677)
    Am I the only one laughing that back in old, antiquated Europe, our passport control have the ability to read the documents, with their own eyes? Oh I forget, how are you supposed to treat your visitors like criminals if you can't take their photograph, fingerprints, and 30-odd other bits of personal data to make sure we aren't terrier-ists (fans of small dogs). It doesn't help prevent terrorist attacks, but it does give you a nice big data mine (and how are you supposed to undermine people's rights effectively if you don't know everything about them).

    It is laughable that there is no non-computerised backup for the system. (How about filling out the forms and scanning them in later?)

  • by ScaredOfTheMan ( 1063788 ) on Wednesday August 15, 2007 @08:37PM (#20243575)
    Yes NICs can go crazy and start blasting broadcasts or Unicasts over your network, if you have a Cisco switch (or any other that supports storm control like features) you may want to enable it, it costs you nothing but the time it takes you to update the config. on the access switch (the one connected to your PCs) get into config mode at type this on every interface that connects directly to a PC (use the interface range command to speed things up if you want). Switch(config-if)#storm-control unicast level X where X is the percent of total interface bandwidth you specify as the threshold for cutting access to that port. Its measure every second, so if you have 100 meg port and you set it to 30, if the PC pushes more than 30 meg a sec in unicasts the switch kills the port, till the pc calms down, if its a 10 meg port the 30 then equals 3 meg, etc etc. You can also add a second line to control broadcasts by changing the word unicast to broadcast. If that had this in place, when the NIC went nuts, the switch would have killed the port, and no outage (I assume a lot here, but you get the point).

You will have many recoverable tape errors.

Working...