Blackout Cause: Buggy Code 377

Posted by michael on Thursday February 12, 2004 @09:31AM from the civilization-meets-the-woodpecker dept.

blanca writes "The big northeast blackout from last summer was caused in part by a software bug in an energy managment system sold by General Electic, according to a story on SecurityFocus. The bug meant that a computerized alarm that should have been triggered never went off, hindering FirstEnergy's response to the train of events that lead to the cascading blackout. Investigators found the bug in a intensive code audit following the outage, and a patch is now available."

This discussion has been archived. No new comments can be posted.

Blackout Cause: Buggy Code

Load All Comments

Search 377 Comments Log In/Create an Account

Comments Filter:

fp? (Score:4, Funny)

by CptChipJew ( 301983 ) writes: <michaelmiller@[ ]il.com ['gma' in gap]> on Thursday February 12, 2004 @09:32AM (#8257173) Journal

The first thing I saw at that site, "Reliable, Field-Proven & Adaptable". Funny.

Well, that statement is only half false, it's reliability has been field-proven.

Share
twitter facebook
- More Reliable than Mars Rover (Score:5, Insightful)
  
  by occamboy ( 583175 ) writes: on Thursday February 12, 2004 @10:38AM (#8257680)
  
  In all fairness...
  
  The Mars Rover's software crashed in just a few days.
  
  Virtually all software should be designed and tested better than it is.
  
  However, I'm perplexed at why the Mars Rover failure and resurrection is considered a miracle of human inginuity, rather than an indictment of crummy testing.
  
  I'll not excuse the power grid software either; but it seems to work more reliably than the software on the Rover.
  
  Parent Share
  twitter facebook
  - Re:More Reliable than Mars Rover (Score:3, Insightful)
    
    by hpulley ( 587866 ) writes:
    
    It is not considered a miracle but it is considered amazing. It is hard enough to debug things sitting on your desk, harder to debug someone else's problem over the phone and worse from orbit but imagine debugging a problem with 10 minutes of light delay! And there is only one computer on that rover so they were using the buggy computer to recover; not an easy task. In the end it turned out to be flawed file management code in the flash memory; the daily TODO list was kept in flash and it couldn't find i
    - - Re:More Reliable than Mars Rover (Score:4, Insightful)
        
        by Anonymous Coward writes: on Thursday February 12, 2004 @11:30AM (#8258202)
        
        Complete testing is impossible. The number of cases that can occur is enormous. To test every single one is impossible within the lifetime of any civilization, let alone the lifetime of a human being or the lifetime of the software itself. Even if you could test every case you can think of, you've still tested only the cases you can think of. What are you going to do, sit around all day and think, "What would happen if a cosmic ray flipped this bit while a surge from the camera's actuators caused the processor to reboot at the same time a martian gave it a good hard kick in the side and spilled martian beer on it?" That's ridiculous.
        
        Complete testing is impossible.
        
        Parent Share
        twitter facebook
  - Re:More Reliable than Mars Rover (Score:5, Insightful)
    
    by Citizen of Earth ( 569446 ) writes: on Thursday February 12, 2004 @11:43AM (#8258324)
    
    Virtually all software should be designed and tested better than it is.
    
    "Software sucks because users demand it to."
    
    Unless every single software company does this, the ones that don't will own the market by virtue of supplying software that "mostly works" two years ahead of the others that supply software that is "perfect, minus epsilon". Then, all of the perfectionados go out of business, and the market returns to its present state. Things are the way they are because that's how various market pressures make them.
    
    Parent Share
    twitter facebook
    - Re:More Reliable than Mars Rover (Score:4, Insightful)
      
      by Mr. Piddle ( 567882 ) writes: on Thursday February 12, 2004 @01:11PM (#8259290)
      
      Things are the way they are because that's how various market pressures make them.
      
      The market is slowly changing, thankfully. A good example of a maturing market would our good old friend: home electrical wiring. How long did it take before every new home since probably the early 1980s is wired pretty much identically. They went through several different types of wire and insulation, grounded and ungrounded outlets, fuses and circuit breakers, etc. In a lot of ways, the software world is no different, and I'd say were at the aluminum wire stage with the various incarnations of systems we have and accompanying reliability and security problems.
      
      Parent Share
      twitter facebook
  - Re:More Reliable than Mars Rover (Score:4, Informative)
    
    by Ken D ( 100098 ) writes: on Thursday February 12, 2004 @11:57AM (#8258451)
    
    The Rover did not crash in "just a few days". The Rover crashed after the number of files in its flash filesystem accumulated to the point where the file table couldn't fit in the available memory anymore. This took 6 months of file accumulation to occur.
    
    Parent Share
    twitter facebook
  - Re:More Reliable than Mars Rover (Score:3, Informative)
    
    by ed1park ( 100777 ) writes:
    
    Your opinion comes from a "glass half full/half empty" perspective, which you can't really address.
    
    What you should be asking is why is it so difficult to write bug free code? The obvious answer is because developing and testing code is harder than you realize. A simple if statement looping 10 times will have over 1000 different code paths that you would need to test if you wanted to be thorough. So a large software project makes this kind of testing impossible.
    
    What people try to do instead is use Pared
- - Yeah, right. (Score:5, Funny)
    
    by Anonymous Coward writes: on Thursday February 12, 2004 @10:39AM (#8257695)
    
    Well, I have news for you: 50MV lines don't exist! Not out in the open, anyway. Was it 50 kV, perchance?
    
    Parent Share
    twitter facebook
    - - Re:Yeah, right. (Score:5, Informative)
        
        by per unit analyzer ( 240753 ) writes: <{moc.liamg} {ta} {ZreenignE}> on Thursday February 12, 2004 @12:35PM (#8258821)
        
        > Well, I have news for you: 50MV lines don't exist! Not out in the open, anyway. Was it 50 kV, perchance?
        
        >>nope, MV... though it may have been 45MV...
        
        The first guy is right; there is no such thing as a 45 MV transmission line. The highest voltage transmission line classification is 765 kV. (That would be 0.765 MV.) In the mid-1970s American Electric Power and Ohio Brass played with some experimental 1.5 MV transmission equipment but they killed the project when they realized land owners would never let AEP put a 1.5MV line in their back yards.
        
        The lines that First Energy put in the trees were 345 kV. I'm guessing they were rated to carry between 1000 to 1500 MVA. I have no idea where the 45 number came from or what unit would have been associated with it.
        
        --zawada
        
        Parent Share
        twitter facebook
  - - Re:50MV arc'd to a tree (Score:5, Interesting)
      
      by plover ( 150551 ) * writes: on Thursday February 12, 2004 @12:04PM (#8258514) Homepage Journal
      
      My property abuts a set of high voltage transmission lines. (I'm about three miles from a coal plant.) The lines cut a long, skinny park through my city. The plat for the site shows a 200 foot wide easement, which is about 30 meters to the property on either edge of the park. I've never measured the height of the towers, but my rough guess is that the line itself is perhaps 25 meters above ground. That puts the line itself about 39 meters from the edge of my property.
      The land beneath the lines was clear-cut about 12 years ago. But there are now trees under this line that are about 10 meters high.
      Years ago when my wife was concerned about "power line emissions" the power company loaned her a meter that showed "electrical fields." I don't remember the scale, or even what it was supposed to measure, but I do remember that we had to actually get about 200 feet from the wire before the field from the line stopped affecting the meter. (Yes, on a humid summer day I once stood in my back yard with a neon bulb and caused it to illuminate by simply dangling a three foot wire from one lead and touching the other.) I had always assumed it was a 750kV line, and that the 100 foot easement was more than sufficient. Now, I wonder. Hey, maybe this is enough of an excuse to go out and get one of those IKE toys!
      
      Parent Share
      twitter facebook
      - Re:Electrical Field Exposure? (Score:5, Informative)
        
        by AB3A ( 192265 ) writes: on Thursday February 12, 2004 @12:30PM (#8258748) Homepage Journal
        
        So what? You use a cell phone, don't you? The electrical energy exposure you get from that is substantially greater.
        
        How about electric blankets or heating pads? How about a battery powered shaver?
        
        You expose yourself to these fields every day to an extent far greater than what you may have received from that transmission line.
        
        By the way, you can light a neon light with a bit of wire and very little power. You can also light it with a MW AM broadcast transmitter less than a mile away; you can light it with a CB radio; and with just a bit more wire, and a location closer to the poles of the earth, you can light it when the earth is hit by a solar flare. Many among the various eco-scare-monger groups like to make this demonstration as if it were an indicator of something dangerous. If it were, there would be no life anywhere near the Arctic Circle.
        
        Aside of the poor maintainance for the clear-cut area, you really have no need to be concerned about this.
        
        Parent Share
        twitter facebook
        
        Not very analogous... (Score:3, Interesting)
        
        by Svartalf ( 2997 ) writes:
        
        In the case of the electric blankets, you're not exposing yourself to a lot of any B or H fields- there's not enough current present to generate much. Now, if you'd said something like a hair dryer, where the field is concentrated to power the motor...
        
        The phone may generate more relative power, but it's at a different frequency- in regards to electricity and the human body, frequency matters as much as anything else.
        
        For DC, 10ma of current may not be noticable to a person.
        
        For 50/60Hz AC, it's going to
        
        Re:Not very analogous... (Score:3, Insightful)
        
        by naarok ( 102579 ) writes:
        
        Water accelerates the growth of a plant, but it doesn't cause the plant to be. The seed did that.
    - It gets worse (oh, and not 50 MV) (Score:3, Informative)
      
      by Beryllium Sphere(tm) ( 193358 ) writes:
      
      The clearance can narrow in some conditions. When the lines get hot, they expand and sag noticeably. Hot weather will do it, and so will high current.
      
      Then, just when you most need the power, a tree that used to be at a just barely safe distance shorts the power line.
      
      The high end for mainstream deployments, by the way, is 750 KV or 1 MV. Corona losses get really bad above that level.
Uh... (Score:5, Interesting)

by Short Circuit ( 52384 ) writes: <mikemol@gmail.com> on Thursday February 12, 2004 @09:32AM (#8257179) Homepage Journal

Didn't the story used to be that after a tech maintenenced the machine, he forgot to re-enable an alarm?

Share
twitter facebook
- Re:Uh... (Score:5, Informative)
  
  by UnknowingFool ( 672806 ) writes: on Thursday February 12, 2004 @10:36AM (#8257661)
  
  An initial cause has always been that the alarm did not sound when the problem occurred; however, First Energy was also blamed because even though there was no alarm, the operators should have seen the problem because the instrumentation display indicated that there was a dangerous surge.
  
  Parent Share
  twitter facebook
  - Re:Uh... (Score:5, Insightful)
    
    by TimTheFoolMan ( 656432 ) writes: on Thursday February 12, 2004 @10:50AM (#8257799) Homepage Journal
    
    According to the SecurtyFocus article, the operators had no way of knowing, because the data wasn't "live." This is a common problem with SCADA systems--the systems will display the "last known-good value" if something goes offline. However, the system should also visibly identify the data as "out of service" or "offline," and this didn't seem to happen. That could be an issue at the server, or it could be something blamed on the people commissioning the XA/21 system (assuming the display is configurable enough to allow you to program it at this level).
    
    Even so, there should have been sufficient watchdog messages between the client, the server, and the field hardware for the XA/21 to broadcast a general alarm along the lines of "I can't talk to the stinking field, so we're all flying blind here, you morons!" This is exactly the same as software in my industry (HVAC fire/security systems for large buildings), where if you lose communication to a subsystem or the field, you have to raise alarms all over the place.
    
    The real question is how you could lose such comm and the operators had no visible indication that they were relying on old data. This sounds like a missed requirement, if not insufficient testing.
    
    Tim
    
    Parent Share
    twitter facebook
    - Re:What does the watchdog watch? (Score:4, Interesting)
      
      by AB3A ( 192265 ) writes: on Thursday February 12, 2004 @12:50PM (#8259043) Homepage Journal
      
      I always treat watchdog software with just a bit of skepticism. The problem, as pointed out by NERC, was that a process in the system was somehow present, but not communicating well.
      
      The alarm subsystem is often a seperate process. It doesn't talk to the field. That's the job for other elements of the SCADA system. It was supposed to watch for semaphores, messages, or read shared memory somewhere. How do you watchdog something like that if it gets the message, but doesn't do what it's supposed to?
      
      In a SCADA system near and dear to my career, we set alarm thresholds so low that the operators expect a certain amount of alarm traffic even for routine events. This helps to discover any misbehavior in the alarm system.
      
      There is such a thing as a control center which is TOO quiet.
      
      Parent Share
      twitter facebook
It's dark here (Score:2, Funny)

by Anonymous Coward writes:

It's dark here, what about a bug?
Patch Available (Score:3, Funny)

by LegionX ( 691099 ) writes: on Thursday February 12, 2004 @09:33AM (#8257187) Homepage

"Patch available"

Phew! then at least i can patch my own power craft before anything happens!

Share
twitter facebook
- Re:Patch Available (Score:3, Informative)
  
  by will_die ( 586523 ) writes:
  
  But they did not provide site to get it from.
This spells trouble (Score:3, Funny)

by dbIII ( 701233 ) writes: on Thursday February 12, 2004 @09:34AM (#8257193)

software bug in an energy managment system sold by General

Electic,
Amazing what a difference a spelling mistake can make - especially in code.

Share
twitter facebook
- Re:This spells trouble (Score:5, Funny)
  
  by duffbeer703 ( 177751 ) * writes: on Thursday February 12, 2004 @09:51AM (#8257326)
  
  Indeed. We all must consider ourselves incredibly lucky that the /. editors are not working on energy management software or embedded medical devices.
  
  Subscribe to Slashdot -- we have to keep these guys employed and out of the real world!
  
  Parent Share
  twitter facebook
Wrong article! (Score:5, Funny)

by ThePretender ( 180143 ) writes: on Thursday February 12, 2004 @09:35AM (#8257199) Homepage

Oh this bug took six months to find and now a patch is available. I thought someone said the bug was found six months ago and now the patch was available. My bad, nobody would ever do that :-)

Share
twitter facebook
Yes but how is Microsoft responsible? (Score:2, Funny)

by jmulvey ( 233344 ) writes:

With all the brainpower on Slashdot, I'm sure we can find a way!
- Re:Yes but how is Microsoft responsible? (Score:3, Funny)
  
  by s20451 ( 410424 ) writes:
  
  Only half right. We have to find a way to make Linux and/or open source the shining alternative.
- Re:Yes but how is Microsoft responsible? (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  Your wish is Bruce Schneier's command [slashdot.org]
- Re: ms WAS responsible - chain of events (Score:5, Funny)
  
  by galtsavenger ( 615185 ) writes: on Thursday February 12, 2004 @11:28AM (#8258186)
  
  I'm sure this was mentioned in the original blackout posts - since the Blaster virus was running full tilt at that time, there was an increased load on servers, routers, switches, hubs and blinky things that go whoop! whoop!! WHOOOOP! The increased demand on computing resources caused increased power demand (not to mention the cranked ACs at the homes of the poor IT staff who were staring at their blackberrys and sweating bullets) which in turn caused the alarm conditions which didn't get alarmed properly and so the powergrid went down. All because of an MS security hole.
  
  How's that?
  
  Parent Share
  twitter facebook
the bug of my dreams (Score:5, Funny)

by vargul ( 689529 ) writes: on Thursday February 12, 2004 @09:35AM (#8257203) Journal

i have been dreaming writting such a bug myself. quite an achievement to blackout quarter of a continent with some crappy code...

Share
twitter facebook
Oh good... (Score:2, Funny)

by the endless ( 412967 ) writes:

a patch is now available

Where's the URL, dude? I want to apply it to my local copy.
See what happens? (Score:4, Insightful)

by poofmeisterp ( 650750 ) writes: on Thursday February 12, 2004 @09:36AM (#8257213) Journal

... when you outsource to the lowest bidder?

I've said enough.

Share
twitter facebook
- speaking of outsourcing... (Score:3, Interesting)
  
  by tuxette ( 731067 ) * writes:
  
  Does anyone know if the code-writing was outsourced abroad?
  With all the lip service about "homeland security," one ought to be concerned about anything affecting national infrastructure being sent abroad where you really don't know who is doing the coding, whether the coding projects are being further outsorced to say alQaidaSoft, etc.
  - Re:speaking of outsourcing... (Score:5, Informative)
    
    by cassidyc ( 167044 ) writes: on Thursday February 12, 2004 @09:55AM (#8257357)
    
    XA21 is developed in Melbourne (Fl.)
    
    Parent Share
    twitter facebook
  - Re:speaking of outsourcing... (Score:4, Informative)
    
    by Anonymous Coward writes: on Thursday February 12, 2004 @09:56AM (#8257361)
    
    I can tell you from working a couple cases at GE Power Systems that a LOT of their coding is done in India, and that the teams they work with state side are comprised mostly of Indians on work visa's and some naturalized Americans of Indian origins. Specifically the guys I talked with were from Gousherott (sp?). Btw this work wasn't outsourced, these were regular employees of GE, just on another continent.
    
    Parent Share
    twitter facebook
Another opinion: maybe Blaster is to blame (Score:3, Interesting)

by kraker ( 687285 ) writes: on Thursday February 12, 2004 @09:39AM (#8257234) Homepage

Bruce Schneier had a very interesting theory in his crypto-gram issue of December. The Blaster virus could be one of the reasons for the power outage:

http://www.schneier.com/crypto-gram-0312.html#1 [schneier.com]

A snippet of the article:

Let's be fair. I don't know that Blaster caused the blackout. The report doesn't say that Blaster caused the blackout. Conventional wisdom is that Blaster did not cause the blackout. But it seems more and more likely that Blaster was one of the many causes of the blackout. Regardless of the answer, there's a very important moral here. As networked computers infiltrate more and more of our critical infrastructure, that infrastructure is vulnerable not only to attacks but also to sloppy software and sloppy operations. And these vulnerabilities are invariably not the obvious ones. The computers that directly control the power grid are well-protected. It's the peripheral systems that are less protected and more likely to be vulnerable. And a direct attack is unlikely to cause our infrastructure to fail, because the connections are too complex and too obscure. It's only by accident--Blaster affecting systems at just the wrong time, allowing a minor failure to become a major one--that these massive failures occur.

Share
twitter facebook
- Argument from ignorance (Score:3, Insightful)
  
  by Gothmolly ( 148874 ) writes:
  
  "Things are so compliated, we don't know that a small event, or series of small events won't bring down the whole system"
  
  Yeah, well I don't know that I won't be fired tomorrow for reading Slashdot at work, but that doesn't mean that I will.
- And Another... (Score:3, Interesting)
  
  by Marxist Commentary ( 461279 ) writes:
  
  How about the energy companies?
  
  Certainly, the energy corporations must be somewhat culpable for not rigorously testing the software in the first place? It is not in the interest of a for-profit company to see to it that such systems are functioning correctly, as that cost will detract from the bottom line profit. Only when disaster strikes can they be goaded into looking into problems.
- Re:Another opinion: maybe Blaster is to blame (Score:4, Informative)
  
  by YU Nicks NE Way ( 129084 ) writes: on Thursday February 12, 2004 @09:51AM (#8257323)
  
  Did you read the Security Focus article? It explicitly stated both that Blaster was not related to the blackout and that SF had been one of the first publications to extend the hypothesis that they had been related.
  
  In short, the Microsoft bashers were wrong -- and at least Security Focus had the guts to acknowledge it.
  
  Parent Share
  twitter facebook
Development vs Engineering (Score:5, Insightful)

by bmongar ( 230600 ) writes: on Thursday February 12, 2004 @09:40AM (#8257244)

The term 'Software Engineering' is bantered about in the software industry. I think little that you could call engineering happens. Software is developed. It doesn't meet the strict standards of testing and reliability of physical products.
I am a software developer not an engineer, as are most people in the field. Software won't become an engineering science until companies are willing to pay for that process. Given the current trend towards cost cutting I don't see that happening anytime soon.

Share
twitter facebook
- Re:Development vs Engineering (Score:5, Interesting)
  
  by Jeff DeMaagd ( 2015 ) writes: on Thursday February 12, 2004 @10:06AM (#8257414) Homepage Journal
  
  I'd sort of tend to agree, although under your standards, the stuff I do as an EE really would fit under development, we don't have the budget to send out for external certification and external testing. No biggie, I guess I can live with being a hardware developer.
  
  Is it true that some states have prohibited Microsoft from issuing MSCEs? I heard this somewhere but I can't remember. Something about Microsoft not having the authority to certify engineers.
  
  Parent Share
  twitter facebook
  - Re:Development vs Engineering (Score:5, Funny)
    
    by kinnell ( 607819 ) writes: on Thursday February 12, 2004 @10:16AM (#8257481)
    
    Is it true that some states have prohibited Microsoft from issuing MSCEs? I heard this somewhere but I can't remember. Something about Microsoft not having the authority to certify engineers
    But couldn't the "Microsoft Certified" part be interpretted as a disclaimer? Something along the lines of "Burger King Certified Brain Surgeon".
    
    Parent Share
    twitter facebook
    - Re:Development vs Engineering (Score:5, Informative)
      
      by Kombat ( 93720 ) writes: <kevin@swanweddingphotography.com> on Thursday February 12, 2004 @10:38AM (#8257691)
      
      It was in Canada. In Canada, "Engineer" is a protected term, like "Doctor." I can't take a 6-month IT course and call myself a "Network Doctor," and put the title "Dr. Kevin" on my business cards. It's the same thing with "Engineer" in Canada (and "Architect", too, interestingly enough).
      
      There is only one university in Canada that is actually allowed to graduate "Software Engineers," and it's in Newfoundland (MUN). Other universities are not allowed to call their grads "Engineers" unless they follow the strict cirriculum requirements of the main engineering authority in Canada, whose name escapes me at the moment.
      
      This is all second-hand info, spoken as a guy who's married to a genuine, certified Engineer (Industrial). :)
      
      Parent Share
      twitter facebook
      - Re:Development vs Engineering (Score:5, Interesting)
        
        by Anonymous Coward writes: on Thursday February 12, 2004 @11:01AM (#8257885)
        
        In Canada, "Engineer" is a protected term, like "Doctor."
        
        Doctor is not a protected term. Perhaps you mean "Medical Doctor"? There are lots of non-medical doctors.
        
        I was arguing once with a MD friend of mine who thought that PhDs (like myself) don't have the right to call themselves Doctor. I explained that while medicince has been around for a very long time, the degree of MD has not. PhDs degrees have a much longer history than MD degrees.
        
        It gets very funny when another friend of mine (who has a PhD in nursing) is called "Dr" in her hospital.
        
        Parent Share
        twitter facebook
        
        Re:Development vs Engineering (Score:4, Funny)
        
        by dmuth ( 14143 ) writes: <doug.muth+slashdotNO@SPAMgmail.com> on Thursday February 12, 2004 @01:34PM (#8259522) Homepage Journal
        
        I explained that while medicince has been around for a very long time, the degree of MD has not. PhDs degrees have a much longer history than MD degrees.
        
        Heh, that reminds me of a friend of mine [unclekage.com] who happens to be a PhD. He likes to poke fun at MDs by saying, "Back in the middle ages, it was the learned scholar who was called 'Doctor'. The man who cut into you was called 'BARBER'!"
        
        And he's teased his physician about this on several occaisions, saying things like, "Just take a little off the top, please!". :-)
        
        Parent Share
        twitter facebook
      - Re:Development vs Engineering (Score:4, Informative)
        
        by superflex ( 318432 ) writes: on Thursday February 12, 2004 @11:17AM (#8258075) Homepage
        
        Universities in Canada must have their curriculum certified by the Canadian Engineering Accreditation Board, the national body for regulating engineering education.
        Furthermore, each province has a regulatory body which manages licensing of Professional Engineers (P.Eng.'s) which is a regulated designation. In Ontario this body is the PEO [peo.on.ca]. They have a webpage here [peo.on.ca] on the whole "software engineering" issue.
        
        Parent Share
        twitter facebook
- Re: Development vs Engineering (Score:5, Insightful)
  
  by Black Parrot ( 19622 ) writes: on Thursday February 12, 2004 @10:19AM (#8257503)
  
  > I am a software developer not an engineer, as are most people in the field. Software won't become an engineering science until companies are willing to pay for that process. Given the current trend towards cost cutting I don't see that happening anytime soon.
  
  It will be interesting to follow the lawsuit news on this one. If someone gets squeezed hard enough, we might see a movement toward good engineering praxis as a result.
  
  More likely the politicians will step in and bail them out, but ISTM that as society continues to rely more and more on software, at some point we're going to decide that we can't afford not to set and follow good engineering standards.
  
  Parent Share
  twitter facebook
- Re:Development vs Engineering (Score:3, Informative)
  
  by Troed ( 102527 ) writes:
  
  Agreed. I'm both a Mechanical Engineer and a Software Engineer, and I work as a consultant in embedded software development. The embedded sector is WAY ahead of "desktop programming" when it comes to strict requirements and processes, and yet not even that is close to being a true engineering discipline.
  
  I've actually concluded myself that software development _can never_ become an engineering discipline, it's too creative a process for that. A software developer is more an artist than an engineer.
  
  Really.
- Re:Development vs Engineering (Score:5, Interesting)
  
  by Tassach ( 137772 ) writes: on Thursday February 12, 2004 @12:39PM (#8258880)
  
  I like to think I'm an engineer, not a developer. The problem is not that I don't know how to do good SW engineering, it that I'm usually not allowed to do good SW engineering. Good engineering is expensive in terms of time and money. The people who sign the checks aren't usually willing to pay for it and aren't willing to wait. The sad part is that they're often right: if you can't afford to wait, and you can't afford to pay the price, you have to settle for what you can get and hope that it's good enough to keep you moving forward.
  You have 4 main variables in the software development equasion: Time, Quality, Functionality, and Efficiency. Notice that we only measure time, not man-hours or monetarycost. As we know from reading The Mythical Man-Month [amazon.com], we cannot reduce time by adding more people or by spending more money. While we list efficiency as a variable, we really have to treat it as a constant within the scope of a single release cycle. Improvements in efficency are generally very gradual and incremental, and for the most part cannot be effectively implemented in the middle of a release cycle.
  I postulate that Time is directly proportional to the product of Quality, Functionality, and Efficiency [T = EQF]. Since E is constant within the scope of a single release, we can't use process improvements or similar techniques to improve quality in the short term. Assuming our goal is to improve quality, we either have to decrease functionality or increase time. Since monetary cost is directly proportional to time (time is money!), managers are very reluctant to give you more time. Furthermore, we are frequently under hard time constraints due to contractual obligations or market pressure. If we can't change time, we either have to sacrifice quality or functionality. Missing functionality is very obvious, whereas low quality isn't necessarily noticable in the short term, so it should be no suprise that quality is almost always takes the back seat to functionality.
  
  Parent Share
  twitter facebook
- Re:Development vs Engineering (Score:3, Funny)
  
  by Mr. Piddle ( 567882 ) writes:
  
  The term 'Software Engineering' is bantered about in the software industry.
  
  When I was young and dumb, I thought it was neat to have "Software Engineer" on my business cards. After a few years of seeing just how inept/underfunded/constrained nearly all software developers are, I changed my job title. Calling a typical programmer a "Software Engineer" is sort of like calling a convict in prison a "Legal Countermeasures Engineer."
Would this be any better in an OSS environment? (Score:4, Insightful)

by bernywork ( 57298 ) * writes: <bstapletonNO@SPAMgmail.com> on Thursday February 12, 2004 @09:41AM (#8257247) Journal

Just a question for everyone here:

Who thinks this could have been any better with Open Source and why?

People make the comment of the many eyes, but who is really looking at the code?

Share
twitter facebook
- Re:Would this be any better in an OSS environment? (Score:2)
  
  by gl4ss ( 559668 ) writes:
  
  it wouldn't have much difference, as the system probably was so specialised anyways.
  
  though maybe they could have used proven building blocks for other parts from os and then focused on the parts they had to do, though they might have done this anyways.
  
  what's stupid is that the whole blackout cascaded to a such large area. like, there shouldn't have been a possibility of that even if the software had been intentionally flawed..
- Re:Would this be any better in an OSS environment? (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  The initial bug would still have been produced with an open source model. There would still have been a huge blackout. The difference is that the bug might have been found and patched much quicker. If you had been without electricity for a week and if you had the source to the application you might have had some insentive to look into the source yourself to prevent it from happening to you again. The great thing is that it would also prevent the same thing from happening to everybody else at the same ti
- Re:Would this be any better in an OSS environment? (Score:5, Insightful)
  
  by eraserewind ( 446891 ) writes: on Thursday February 12, 2004 @09:56AM (#8257363)
  
  People make the comment of the many eyes, but who is really looking at the code?
  Probably nobody, especially if you are talking about something as dull as a utility management app. That's why companies pay people to look at these things.
  
  Open source almost certainly would have not prevented the bug. The bug might have been found faster after it happened though, because curious (or under pressure from their boss) engineers engineers in every facility affected would spend at least some time trying to figure out what went wrong.
  
  Having the source is great, and you would be surprised at the number of companies who license the source for what they use. Risk management is important. Free isn't everything, you can get many of the same things by paying :-)
  
  Parent Share
  twitter facebook
- Re:Would this be any better in an OSS environment? (Score:4, Insightful)
  
  by Detritus ( 11846 ) writes: on Thursday February 12, 2004 @09:59AM (#8257380) Homepage
  
  I don't care whether it is open source or closed source or divine inspiration, software reliability requires testing. Depending on the reliability requirements, proper testing can be very expensive. That's assuming anyone has even bothered to state reliability requirements.
  There are also system reliability requirements to be considered. Hardware fails. Software fails. Is the system designed to detect and cope with component failures?
  GE's software may suck. I don't know. I've never seen it. I am suspicious of people who attempt to hide their own negligence by blaming a third party.
  
  Parent Share
  twitter facebook
Blame Canada (Score:3, Funny)

by olderchurch ( 242469 ) writes: on Thursday February 12, 2004 @09:41AM (#8257248) Homepage Journal

I thought the Canadians did it?

Share
twitter facebook
- Re:Blame Canada (Score:2)
  
  by in7ane ( 678796 ) writes:
  
  Either:
  
  It was the Canadian arm of GE
  
  The programmer was Canadian
  
  See - it still works!
- Re:Blame Canada (Score:2)
  
  by AKnightCowboy ( 608632 ) writes:
  
  I thought the Canadians did it?
  Perhaps this project was outsourced to India? Wouldn't it be lovely if we could bash Indians and Ohio in one article?
Bad bugs (Score:5, Informative)

by Rico_za ( 702279 ) writes: on Thursday February 12, 2004 @09:42AM (#8257257)

Chalk up another one for the most disasterous software bugs in history [tu-muenchen.de]. This one should give the Ariane 5 explosion [umn.edu] a go for no 1.

Share
twitter facebook
- Re: Bad bugs (Score:2)
  
  by Black Parrot ( 19622 ) writes:
  
  > Chalk up another one for the most disasterous software bugs in history. This one should give the Ariane 5 explosion a go for no 1.
  
  The A5 wasn't caused by a bug, at least not in the sense we usually use the term. It was caused by a decision to re-use a part from the A4 and its embedded software, without bothering to review its specifications.
  
  It's certainly a problem that good "engineering" should have caught, but most of us wouldn't call it a bug.
Will they apply it?! (Score:5, Funny)

by weave ( 48069 ) writes: on Thursday February 12, 2004 @09:43AM (#8257259) Journal

a patch is now available
I'm waiting for the next big power failure, then the excuses about why the patch was never applied. :)

Share
twitter facebook
- Re:Will they apply it?! (Score:2, Funny)
  
  by eglamkowski ( 631706 ) writes:
  
  Sometimes it is better to not apply a patch. At least not without first checking it out first to verify its authenticity :-) [itsecurity.com]
Hmmm... (Score:4, Funny)

by supersam ( 466783 ) writes: on Thursday February 12, 2004 @09:45AM (#8257279) Homepage

One code to light it all,
One coder to code it,
One debugger to miss the bug
and into the darkness lead them. ...

Share
twitter facebook
way, way off-topic ... (Score:2, Funny)

by nbvb ( 32836 ) writes:

But is anyone else thinking of Medal of Honor?

Sound zee alarm!!
Software "Engineering"? (Score:5, Insightful)

by fygment ( 444210 ) writes: on Thursday February 12, 2004 @09:51AM (#8257332)

Now if in fact this was buggy code, and if Software Engineers are in fact part of the engineering profession, then a professional body should be taking the engineer(s) to task. This would be the same thing that would take place in the event that a civil engineer signed off on faulty building plans. But smart money says no software "engineer" will get nailed.

A look at the software industry will show this to be the norm. And that is why there is such a problem with having people claiming the title of "software engineer". "Engineer" doesn't just mean having the technical savvy, it also means having a responsibility to the public for the use of that knowledge and being beholden to a professional body charged with ensuring you are held accountable.

Share
twitter facebook
- Re:Software "Engineering"? (Score:5, Insightful)
  
  by Detritus ( 11846 ) writes: on Thursday February 12, 2004 @10:07AM (#8257425) Homepage
  
  You can't have responsibility without authority. The building never gets built without the signature of the civil engineer on the plans. Few software engineers have that control.
  
  Parent Share
  twitter facebook
- Re:Software "Engineering"? (Score:5, Insightful)
  
  by Anonymous Coward writes: on Thursday February 12, 2004 @10:18AM (#8257499)
  
  That's why you'll never see a proper software ENGINEER... when engineers undertake a project they know the materials, the requirements, the environment, etc. As soon as a piece of software goes out the door all bets are off.
  
  How long do you think engineering (as it stands today) would last if that bridge meant to stand on bedrock spanning no more than 1000' and carry a load of no more than 1500 tons at any given time were suddenly put on a sandy bed, stretched to cover 1100' and carry 1600 tons... oh yes, and the user didn't like that third support so they removed it.
  
  Software and engineering are VASTLY different disciplines. If software is ever judged like engineering then it would kill the market because the EULAs would have to say that you use THIS motherboard with X amount of RAM and Y amount of hard-drive space. The agreement would only be in effect as long as you used OS "ABC" and no other processes besides those required by the OS and the programme in question were running. It would make the cost of running a business prohibitively expensive.
  
  When you consider that most large-scale software development projects are equivalent in complexity to building structures like the Golden Gate Bridge or the Empire State Building (I didn't want to mention any buildings outside the US since I realise the audience on here is largely American and probably wouldn't know what I was talking about) consider the cost of actually treating software development the same way... I'm sure companies everywhere will be lining up to pay $300M for that content management system.
  
  Parent Share
  twitter facebook
  - Re:Software "Engineering"? (Score:5, Insightful)
    
    by YU Nicks NE Way ( 129084 ) writes: on Thursday February 12, 2004 @12:13PM (#8258606)
    
    Engineering is all about tolerances and modes of failure. If I design my car to be able to take a fifteen mph front end collision, and you drive into a wall at thirty, I'm not responsible, and my E&O won't wind up paying out.
    
    Currently, software is built in a craft/guild model: senior developers (masters) teach junior developers (journeymen) who've reached a certain level of expertise. Interns (apprentices) are drafted into the profession and groomed into junior devs. There is a widely held notion of subjective quality, and we can recognize a masterwork, but we can't quantify what it takes to generate one.
    
    Software engineering will become a true engineering discipline only when there is an objective measure of defect level and an objective notion of what constitutes an adequately circumscribed operating environment. Once we have adequate definitions of those things, though, software production will become industrialized almost immediately.
    
    Parent Share
    twitter facebook
- Re:Software "Engineering"? (Score:5, Insightful)
  
  by CharlieG ( 34950 ) writes: on Thursday February 12, 2004 @10:21AM (#8257529) Homepage
  
  Your right - MOST software "engineers" aren't. Guess what? If they were, you would NOT see death march projects, software would cost a LOT more, and when the chief "eng" on the software project (or for that matter any Engineer on the project) said "This can NOT ship, it's not ready", the company would have to suck it up, and NOT ship.
  
  Software Enginners would have to carry E&O insurance (Think of it as malpractice insurance, like a MDs). It MIGHT be supplied by their boss, but...
  
  And in exchange for taking on this risk, what would a software Engineer EARN? You'd better believe it would be a LOT more than it is now.
  
  You would still have "coders" - in fact, MOST "software engineers" would go back to their pre title inflation title - "Programmer". The SE on the job would be responsible for all the code that the programmers wrote
  
  Just like MOST jobs don't have to be signed of by a PE, most software would NOT have to be signed off by an SE - but if you use software that wasn't signed off by a SE, and you caused 50b in losses, you would loose YOUR shirt
  
  At this point in time, it seems that the people of the US just have NOT found the need to come up with the idea of a licensed SE. I predict it will happen, and within the next 25-30 years. There have been movements withing the programming trade to do this. it's coming - but when?
  
  Right now, software development is very much like the "guilds" of the Middle Ages. You didn't have PEs back then - you had folks who learned from other folks, and you had projects that failed massively. Eventually, things became codified, and a lot of the failures stopped - at least for day to day stuff. But guess what? Buildings still fall down, even in construction (read the book "why buildings fall down"). It's just that for "common" designs, it doesn't happen
  
  Parent Share
  twitter facebook
  - Re:Software "Engineering"? (Score:4, Interesting)
    
    by zeus_tfc ( 222250 ) writes: on Thursday February 12, 2004 @12:05PM (#8258523) Homepage Journal
    
    Just a nitpick,
    
    Creating a true software engineer is different than making them PE's. Right now, most of the engineers that design things in industry don't have PE's and if they do, they don't make it known publicly for the very reasons you mentioned.
    
    The rest of us with out PE's don't need the insurance, as that is supplied by the company.
    
    Also, keep in mind that just because an engineer worked on something doesn't mean that it will be expensive. Most of what I engineer costs less than a dollar.
    
    If you haven't guessed, IAAE (I am an Engineer)
    
    Parent Share
    twitter facebook
  - - Re:Software "Engineering"? (Score:3, Informative)
      
      by KenSeymour ( 81018 ) writes:
      
      Don't get your hopes up. In the recent Wired article about software development, it was pointed out that some of the Indian companies are
      SEI level 4 and 5 shops.
      
      So if tougher standards are required, more work could go to India.
      The required activities to get to SEI level 3 are mostly management, so programmers by themselves cannot bring the level of software development beyond that.
- What about the actual Engineers involved? (Score:5, Interesting)
  
  by GoofyBoy ( 44399 ) writes: on Thursday February 12, 2004 @10:54AM (#8257834) Journal
  
  The software handled one part of the electrical system involved.
  
  What about a good Electrical/Mechanical/Civil Engineering solution that would have prevented it from cascading through different systems / electrical companies / countries?
  
  One piece of software which didn't raise an alarm is shocking. The fact that it cascaded over such a wide area is simply mind blowing.
  
  Before we talk about "software engineers" how about talking about "traditional engineers" and their role in this massive failure?
  
  Parent Share
  twitter facebook
Typo... (Score:3, Funny)

by MarsCtrl ( 255543 ) writes: on Thursday February 12, 2004 @09:52AM (#8257335) Homepage Journal

The big northeast blackout from last summer was caused in part by a software bug in an energy managment system sold by

General Electic, according to a story on SecurityFocus.

This is Slashdot! Isn't that supposed to say Microsoft? It's always Microsoft.

Share
twitter facebook
Who coded this? Homer Simpson? (Score:4, Interesting)

by prgrmr ( 568806 ) writes: on Thursday February 12, 2004 @09:53AM (#8257343) Journal

From the article:

When a backup server kicked-in, it also failed, unable to handle the accumulation of unprocessed events that had queued up since the main system's failure. Because the system failed silently, FirstEnergy's operators were unaware for over an hour that they were looking at outdated information on the status of their portion of the power grid, according to the November report.

How in the world did they manage to build a system nearly completely dependant upon computers, and yet not know when they lost not just one, but two computers that monitored the system?

Homer: Don't turn off the computer! Don't turn off the computer! Don't turn off the computer!

"Click"

Share
twitter facebook
TIBCO middleware (Score:3, Insightful)

by Anonymous Coward writes: on Thursday February 12, 2004 @09:56AM (#8257360)

Never have I worked with a vendor so arrogant and yet so totally clueless. Their UDP based reliability protocol is total crap, regardless of their boasts that it is equiv to TCP.

And yep, it runs on major critical systems, including energy systems and satellites.

Lean on it in the slightest and it will crash and burn with little chance for recovery. Tibco even says they don't test their own software (lack of docs lowers their liability). Press them for test results and they will offer you to pay them to test for you.

When a backup server kicked-in, it also failed, unable to handle the accumulation of unprocessed events that had queued up since the main system's failure.

Sounds like classic Tibco.

Share
twitter facebook
Did we steal this code from the Russians? (Score:2, Funny)

by jakedata ( 585566 ) writes:

Let me guess, they blacked out the Northeast in retaliation for blowing up Siberia with our trojan-horse pump and valve control system.
Metroid (Score:5, Insightful)

by Graymalkin ( 13732 ) * writes: on Thursday February 12, 2004 @10:02AM (#8257398)

Blaming the black out on a software bug is a damn cop-out. The cause of the black out was a horribly managed electrical grid that can barely keep up with the current demand. Any major failure in the system can cause a cascading failure of the entire section of the grid. That is a horrible design. A software bug may have been the trigger but it is by no means the true cause.

The grid in the North East US is supplied by horribly inefficient and antiquated power lines that were struggling to keep up thirty years ago. That they are still in use today is an outright crime. There's also the issue of the operators of the lines generators trying to save a few bucks by cutting maintenance on equipment and facilities and cutting supervising staffs down to skeleton crews. It is much easier to fit "software bug" into a sound bite so the news media will stick with that. Unfortunately the real cause of the black out is not ever going to be patched and another blackout is as inevitable as this last one was. I hope next time a few more people will have invested in backup generators or some alternate form of power to keep from losing their business during a blackout.

Share
twitter facebook
- Re:Metroid (Score:3, Insightful)
  
  by Milalwi ( 134223 ) writes:
  
  The cause of the black out was a horribly managed electrical grid that can barely keep up with the current demand.
  
  Wow. Quite an accusation. Any facts to back it up?
  
  Any major failure in the system can cause a cascading failure of the entire section of the grid. That is a horrible design.
  
  Really? There are major circuit outages on the Eastern Interconnected Network every day. The system is designed to have the local area go black instead of blacking out a widespread area. That was the lesson of
OK, time to revisit advanced development methods (Score:3, Insightful)

by starseeker ( 141897 ) writes: on Thursday February 12, 2004 @10:05AM (#8257408) Homepage

If this isn't a call to take a closer look at the possibility of more widely using tools like Z and B to develop important software, I don't know what is.

Yes, they're difficult. Yes, they aren't likely to eliminate all bugs. BUT. They provide a much better chance (as I understand it - I'm not an expert) that what is designed is what actually gets implimented. That shifts the burden onto the design, but that's OK - that burden was always there. It just means that the design gets properly implimented, which is all that can reasonably be asked of the coding process.

Currently, again as I understand it, the life of a software program in development is a constant struggle by the developers to cope with ever changing demands of customers. I think if people want matters to improve the customers are going to have to come to grips with reality, take the time to sit down and think things through, and make all critical design decisions BEFORE the development process begins. More expensive up front? You bet. That's why I think companies should look at cooperative effort for this type of thing. Distribute the cost of developing one really good program across an industry. A lot of the same core functionality can likely be shared between businesses - if they all pay for one proper design and implimentation of an open program up front, and they all get copies of the logic and proof code with rights to extend as they see fit, they all benefit. They can also open up the more general parts of the package to the world at large under GPL, and anyone could contribute who can generate valid B and Z designs/proofs. Sort of an "academic" open source code development forum - peer review and all. The companies get the benefit of all new development - if they are using it internally they can extend the GPL code for themselves, so long as they don't distribute it. If they do distribute it, they can so so under GPL for everyone to enhance. A plugin based model can also allow them to develop components to the system they can sell as commercial software, if they wish.

Whether this would work/appeal with corporate thinking I have no idea - many of those folks seem to view cooperation like the plague. But it might allow a higher grade of software to be developed and universally used, and I have a hard time imagining how that could be a bad thing for anyone.

Share
twitter facebook
Not Surprised (Score:4, Insightful)

by Anonymous Coward writes: on Thursday February 12, 2004 @10:06AM (#8257416)

Posting anonymously for obvious reasons to me :)

Given my personal experience with this certain Fortune 5 company and software development as a whole, I am not surprised.

The bottom line is that there is soooo much software developed here by non-computer programmers. There are many great Engineers (Mechanical, Aerospace, etc.) here, yet very few can write good code. Many of them are asked to write code nonetheless and thanks to the travesty that is Visual Basic and other Rapid Application Development tools the code that is produced is extremely un-maintainable.

Then you have the matter of people moving jobs every 2 years and the poor bastard who has to maintain someone else's code gets lost inside of it.

Consider me very frustrated at the whole process.

Share
twitter facebook
SCADA is really neato... (Score:2, Informative)

by Anonymous Coward writes:

SCADA is a protocol which can be used to control and monitor small things; it is not just in use with the power industry managing high-tension wires, but they also use it to control converyor belts in manufacturing facilities, or even automatic doors on trains. All of that stuff has code around it, one way or another, and every so often bugs do appear.
No-one writes flawless code, not Sun, not IBM, and not even Linus or Alan Cox or Larry Wall. Anything that is controlled by code is bound to break, but that
Argument against centralization (Score:4, Insightful)

by sphealey ( 2855 ) writes: on Thursday February 12, 2004 @10:24AM (#8257560)

In the wake of the blackout there were a lot of calls to create a centralized, monolithic dispatching center that would manage all electric generation and transmission in North America.
To me, this report give a good example of why a monolithic (monocultural) dispatching system is not a good idea. If every transaction were controlled by a central center, a single software bug could shut down the entire North American grid.
sPh

Share
twitter facebook
- Re:Argument against centralization (Score:3, Funny)
  
  by ratamacue ( 593855 ) writes:
  
  I've always thought that as technology advances, individual households will become more and more self-sufficient, and eventually centralized government services (or pseudo-government services) will be eliminated. This includes power, water, and sewer, as well as phone, cable, internet, or anything else that crops up in the future.
  
  This may seem impossible to people living in today's world, but it makes perfect sense in a world where technology is so efficient and perfected that every household can easily af
Apparently, not DCOM/OPC related (Score:3, Informative)

by TimTheFoolMan ( 656432 ) writes: on Thursday February 12, 2004 @10:32AM (#8257627) Homepage Journal

Based on the PDF for the XA/21 system, it sounds like this wasn't related to some of the DCOM/OPC issues many (myself included) were speculating about. Thoough it's a SCADA control system (where Windows is common, though not universal), it's running on AIX (IBM or Motorola) or Solaris.

Interestingly enough, the sales literature describes it as having, "[an] established track record of field performance - over one million hours of online operation."

I wonder if they'll revise the brochure now?

Tim

Share
twitter facebook
We kick MS, but GE did the wopper... (Score:3, Interesting)

by CokoBWare ( 584686 ) writes: on Thursday February 12, 2004 @11:00AM (#8257883)

We may slam Microsoft for all of it's bugs, but it's really hard to top a software bug triggering an international blackout the size of one last summer. I think I should sue GE for making me walk 3.5 hours home in the heat with no money in Toronto, uphill, because I couldn't take a subway home. I smell a lawsuit the size of the eastern seaboard.

Share
twitter facebook
Software engineering *not* possible. (Score:4, Interesting)

by master_p ( 608214 ) writes: on Thursday February 12, 2004 @11:17AM (#8258071)

After lots of years as a developer, I realized that the engineering process that goes into other professions (for example, civil engineering) can't be applied to software. The reason is simple: software is many orders more complex. Software has many interdependencies between components, has many states, and it is subject to change every minute. It's very difficult to see ahead and provide APIs that fit all the needs, that's why we go back and change the damn thing. What does a civil engineer has to do ? he/she has to combine parts and test if they hold together. There are a lot of parts, but the general principles are a few and can be easily remembered...unlike software.

Furthermore, the tools we have for the job are inadequate. The programming languages are primitive. The debugging tools are dumb. The machines are not clever and strong enough to prove the mathematical theorems behind its program. We don't even learn these things in college...we learn how to use programming languages, but we don't learn how to program...but I seriously believe we will never learn how to program, because a program's complexity increases tenfold for each line of code written!!!

Share
twitter facebook
- Try something new (Score:3, Informative)
  
  by sleepingsquirrel ( 587025 ) * writes:
  
  ...If programming is so complex, then why don't we try something new. You want a program without state? Try Haskell [haskell.org]. You want to be able to prove something about your program? Try ML [wikipedia.org]. But don't despair, I think the reason for crummy software is that it hasn't been around for that long. Civil engineers have had the hindsight of building roads, and aqueducts, and buildings for thousands of years. Software been around for what, 2 generations?
blaming the software is easy (Score:3, Insightful)

by dewdrops ( 79519 ) writes: on Thursday February 12, 2004 @11:32AM (#8258223) Homepage

So the software didn't raise alarms as it should've. That's bad. But it seems to me that the software is being made a scape goat here. It's much easier to blame "that #$@&@$ computer" than "FirstEnergy's failure to trim back trees encroaching on high-voltage power lines" or the fact that the infrastructure for the powergrid is old and poorly setup such that one failure can bring down the whole system. There's no reason why a failure in Ohio should blackout New York and there's nothing software can do to fix that.

Share
twitter facebook
No Wintel bashing? Oh wait it's RISC/UNIX code! (Score:4, Insightful)

by Glasswire ( 302197 ) writes: on Thursday February 12, 2004 @12:43PM (#8258945) Homepage

Had this been a Windows-based system, the torrent of comments about how unreliable the OS and platform fundementally was would be huge.

Funny, just because this ships for "industrial strength" AIX / Solaris RISC systems (see specs on pg 8) [gepower.com], I don't see any cheap, reflexive comments about the platform.

I guess the message here is that good or bad code can be written for any architecture.

Share
twitter facebook
The alarm bug contributed but was not the cause (Score:5, Informative)

by dtjohnson ( 102237 ) writes: on Thursday February 12, 2004 @12:58PM (#8259147)

After looking at the original report [nerc.com], it looks more like the GE XA21 SCADA network failure was not the primary cause of the cascading failure but more an effect of the failure. The key failure seems to be a software system callled the "State Estimator" (SE) that is used by the Midwest System Operator (MISO), a NERC reliability coordinator, to develop optimal solutions of for the planned operating level of all of the power generation and transmission equipment in the MISO area covering about 10 midwest states and 1 million square miles. It is not described in much detail but the SE seems to be an optimization tool using a linear programming model that gathers availability data for all of the major system components and load demand every five minutes and then calculates the 'optimal' use of those system components to maintain system reliability at the required level. The 'solution' of the model is then used to plan the operation of the overall system by sending the target operating levels to each facility in the system. So why did it fail? Two reasons. First, the model depends on having accurate availability information from each major system component. Status information is sent to MISO in Indiana by the "ECAR" data netork or by direct links. On the day of the failure, the direct link to a key transmission line was not working and the analyst had turned off the estimator to troubleshoot it. After fixing the problem, he went to lunch and forgot to put the system back in automatic mode where it would develop updated solutions. This situation existed for 2 hours from 12:15 to 14:40. When the estimator was switched back to automatic, it was unable to develop a solution because another key transmission line had overloaded and tripped and *its* new non-operational status was unknown to the model, apparently because the status of that line is assumed to be 'on' until told otherwise. This problem was not corrected until 16:04. The bottom line is that a critical major planning tool was not available for 4 hours for a regional generation and distribution system that absolutely required it's use to be operated successfully when the system power supply was very close to the demand.

The SCADA system itself did not fail, but its alarm function did, which provides alarms to control room operators about system operational problems. The problem with the alarm function seems to be a case of too many alarms for the system to handle as the problems multiplied. The software bug that they are now reporting was probably related to the unexpectedly large number of alarms that the system was experiencing. The new alarm inputs built up and then overflowed the process input buffers. The alarm system just stalled while processing an alarm event and the alarm function stopped. Then, at 14:41 the primary server hosting the alarm processing application failed due to some combination of the stalling of the alarm application and the queueing to the remote terminals. The hapless backup server then was automatically activated and everything was was transferred to it, even the functional non-alarm stuff. The backup server failed after 13 minutes. Basically, the SCADA alarm system seems to have been massively overloaded (which shouldn't ever happen, of course) beyond the capability of the system design to cope with. The bug apparently prevented an indication that the alarm system was failing but it looks like the cascading failure still would have occurred even if the software bug had not been present because the system deterioration had progressed to far to recover by the time that the bug manifested itself.

The immediate cause of the failure seems to be the forgetfulness of the analyst who was operating the planning model. The significant underlying contributory cause seems to be a very poor regional operational design in which a critical centralized system planning tool was being used with insufficient backup and oversight. It looks as though both Unix and Windows escape blame. The SCADA system probably was doing far more than it's designers intended and probably performed heroically until it died. 'Aye Captain...I canna do no more.'
Read the rest of this comment...

Share
twitter facebook
We win again! (Score:3, Funny)

by FreshFunk510 ( 526493 ) writes: on Thursday February 12, 2004 @03:05PM (#8260412)

Chalk one up for software again! First the Mars lander Spirit and now this! w007! 1337 programming!

Software: 2
Hardware: 0

Share
twitter facebook
- Re:Hmm (Score:3, Funny)
  
  by Rosco P. Coltrane ( 209368 ) writes:
  
  the XA/21 system has improved utilities' bottom lines
  
  Who knows, perhaps it was only the overhead lines that went dead ...
- Re:Hmm (Score:5, Interesting)
  
  by A55M0NKEY ( 554964 ) writes: on Thursday February 12, 2004 @10:16AM (#8257486) Homepage Journal
  
  Once upon a time, there was a power grid without any software. This is true because electricity predates computers. What did they do then?
  
  I bet they had much wider safety margins built into the system which prevented blackouts. But these safety margins probably cost money ( I say this without knowing a thing about the electrical system ) they probably mean a less efficient use of resources. So power companies buy GE's software. They don't buy it so that they can have an added measure of blackout prevention, they buy it because it enables them to cut out expensive/inefficient safety margins without (supposedly) sacrificing reliability. They do this to lower their cost of providing electricity to you.
  
  Parent Share
  twitter facebook
- Re:GE Outsourcing To India (Score:5, Informative)
  
  by cassidyc ( 167044 ) writes: on Thursday February 12, 2004 @09:49AM (#8257306)
  
  That might be the case except that XA 21 is developed in melbourne (Fl.)
  
  facts before hysteria thanks
  
  Parent Share
  twitter facebook
  - Re:GE Outsourcing To India (Score:4, Funny)
    
    by TheSync ( 5291 ) writes: on Thursday February 12, 2004 @11:51AM (#8258397) Homepage Journal
    
    Oh mgod, we better stop outsourcing our precious programming jobs to Florida!
    
    It is unpatriotic to move them from California, where they belong! I bet they pay the people in Florida a lot less.
    
    (This is a joke)
    
    Parent Share
    twitter facebook
- Re:This is unacceptable (Score:5, Informative)
  
  by cassidyc ( 167044 ) writes: on Thursday February 12, 2004 @09:52AM (#8257338)
  
  if you read the article and other associated articles, you will realise that this bug did not *cause* the blackout, on it's own this bug would have had no effect on the continued power supply. However, the timing of the bug along with a number of other issues (which I wont repeat here, read the article for a clue!) all contributed.
  
  Parent Share
  twitter facebook
- Re:Text of the article (Score:5, Funny)
  
  by AKnightCowboy ( 608632 ) writes: on Thursday February 12, 2004 @09:55AM (#8257350)
  
  The comment preceding the code in question was:
  
  // Not sure why this works for my test data. // Probably should come back and re-write this // if we have time before the product ships.
  
  Parent Share
  twitter facebook
- Re:Text of the article (Score:2)
  
  by uncoveror ( 570620 ) writes:
  
  That's a cover-up. It was really a Martian invasion. Mars was at its closest point to Earth at the time. Read more! [uncoveror.com]
- Re:A patch is now available (Score:2, Informative)
  
  by mstyne ( 133363 ) writes:
  
  Right here [kernel.org].
- By the way, the actual bug... (Score:3, Informative)
  
  by thrill12 ( 711899 ) writes:
  
  ...that presented itself in the AT&T software is told at the end of the chapter, repeated here for your convenience:
  "As it happened, the problem itself - the problem per se - took this form. A piece of telco software had been written in C language, a standard language of the telco field. Within the C software was a long "do... while" construct. The "do... while" construct contained a "switch" statement. The "switch" statement contained an "if" clause. The "if" clause contained a "break." The "break" wa

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

fp? (Score:4, Funny)

More Reliable than Mars Rover (Score:5, Insightful)

Re:More Reliable than Mars Rover (Score:3, Insightful)

Re:More Reliable than Mars Rover (Score:4, Insightful)

Re:More Reliable than Mars Rover (Score:5, Insightful)

Re:More Reliable than Mars Rover (Score:4, Insightful)

Re:More Reliable than Mars Rover (Score:4, Informative)

Re:More Reliable than Mars Rover (Score:3, Informative)

Yeah, right. (Score:5, Funny)

Re:Yeah, right. (Score:5, Informative)

Re:50MV arc'd to a tree (Score:5, Interesting)

Re:Electrical Field Exposure? (Score:5, Informative)

Not very analogous... (Score:3, Interesting)

Re:Not very analogous... (Score:3, Insightful)

It gets worse (oh, and not 50 MV) (Score:3, Informative)

Uh... (Score:5, Interesting)

Re:Uh... (Score:5, Informative)

Re:Uh... (Score:5, Insightful)

Re:What does the watchdog watch? (Score:4, Interesting)

It's dark here (Score:2, Funny)

Patch Available (Score:3, Funny)

Re:Patch Available (Score:3, Informative)

This spells trouble (Score:3, Funny)

Re:This spells trouble (Score:5, Funny)

Wrong article! (Score:5, Funny)

Yes but how is Microsoft responsible? (Score:2, Funny)

Re:Yes but how is Microsoft responsible? (Score:3, Funny)

Re:Yes but how is Microsoft responsible? (Score:2, Interesting)

Re: ms WAS responsible - chain of events (Score:5, Funny)

the bug of my dreams (Score:5, Funny)

Oh good... (Score:2, Funny)

See what happens? (Score:4, Insightful)

speaking of outsourcing... (Score:3, Interesting)

Re:speaking of outsourcing... (Score:5, Informative)

Re:speaking of outsourcing... (Score:4, Informative)

Another opinion: maybe Blaster is to blame (Score:3, Interesting)

Argument from ignorance (Score:3, Insightful)

And Another... (Score:3, Interesting)

Re:Another opinion: maybe Blaster is to blame (Score:4, Informative)

Development vs Engineering (Score:5, Insightful)

Re:Development vs Engineering (Score:5, Interesting)

Re:Development vs Engineering (Score:5, Funny)

Re:Development vs Engineering (Score:5, Informative)

Re:Development vs Engineering (Score:5, Interesting)

Re:Development vs Engineering (Score:4, Funny)

Re:Development vs Engineering (Score:4, Informative)

Re: Development vs Engineering (Score:5, Insightful)

Re:Development vs Engineering (Score:3, Informative)

Re:Development vs Engineering (Score:5, Interesting)

Re:Development vs Engineering (Score:3, Funny)

Would this be any better in an OSS environment? (Score:4, Insightful)

Re:Would this be any better in an OSS environment? (Score:2)

Re:Would this be any better in an OSS environment? (Score:2, Insightful)

Re:Would this be any better in an OSS environment? (Score:5, Insightful)

Re:Would this be any better in an OSS environment? (Score:4, Insightful)

Blame Canada (Score:3, Funny)

Re:Blame Canada (Score:2)

Re:Blame Canada (Score:2)

Bad bugs (Score:5, Informative)

Re: Bad bugs (Score:2)

Will they apply it?! (Score:5, Funny)

Re:Will they apply it?! (Score:2, Funny)

Hmmm... (Score:4, Funny)

way, way off-topic ... (Score:2, Funny)

Software "Engineering"? (Score:5, Insightful)

Re:Software "Engineering"? (Score:5, Insightful)

Re:Software "Engineering"? (Score:5, Insightful)

Re:Software "Engineering"? (Score:5, Insightful)

Re:Software "Engineering"? (Score:5, Insightful)

Re:Software "Engineering"? (Score:4, Interesting)

Re:Software "Engineering"? (Score:3, Informative)

What about the actual Engineers involved? (Score:5, Interesting)

Typo... (Score:3, Funny)

Who coded this? Homer Simpson? (Score:4, Interesting)

TIBCO middleware (Score:3, Insightful)

Did we steal this code from the Russians? (Score:2, Funny)

Metroid (Score:5, Insightful)

Re:Metroid (Score:3, Insightful)

OK, time to revisit advanced development methods (Score:3, Insightful)

Not Surprised (Score:4, Insightful)

Software engineering not possible. (Score:4, Interesting)