Forgot your password?
typodupeerror
Businesses IT

Message Storm Knocks NYSE Offline 163

Posted by Zonk
from the fall-down-go-boom dept.
ninjee writes "The New York Stock Exchange is re-examining its network after it was forced to close four minutes early at 3:56pm on Wednesday (1 June) because of a communications glitch. Trading opened on time (09:30 EDT) the following morning but the outage irked traders and raised questions about the reliability of a network described as 'ultra reliable' following improvements made in the wake the September 11 terrorist attacks. The outage stemmed from a fault in a system designed to distribute market data and operate computer trading systems. NYSE Chief Executive John Thain said that both the main system and its backup were swamped with error messages, Reuters reports. He added that the exchange would carry out remedial work designed to prevent any repetition of the problem."
This discussion has been archived. No new comments can be posted.

Message Storm Knocks NYSE Offline

Comments Filter:
  • They will begin beating the squirrels at precisely 3:55 EST from now on.
  • by Anonymous Coward on Tuesday June 07, 2005 @06:50PM (#12752667)
    Immediately claimed the message storm to be the work of linux hackers
  • It is "ultra reliable' but you've got to remember the amount of hits this site takes a day... it makes /. Trolling look like a fairy Godmother!
  • no final print (Score:1, Interesting)

    by gbasin (890301)
    I as well as many others in my office got royally screwed here, getting stuck with quite sizeable unhedged positions overnight. It's bad enough that order routing went down, but they failed to open up for a final print (as originally proposed) later in the afternoon. Very bad.
  • To NYSE (Score:3, Informative)

    by mcguyver (589810) on Tuesday June 07, 2005 @06:59PM (#12752759) Homepage
    1. If anything can go wrong, it will. (see Murphy's law)
    2. Systems in general work poorly or not at all.
    3. Complicated systems seldom exceed five percent efficiency.
    4. In complex systems, malfunction and even total non-function may not be detectable for long periods (if ever).
    5. A system can fail in an infinite number of ways.
    6. Systems tend to grow, and as they grow, they encroach.
    7. As systems grow in complexity, they tend to oppose their stated function.
    8. As systems grow in size, they tend to lose basic functions.
    9. The larger the system, the less the variety in the product.
    10. The larger the system, the narrower and more specialized the interfaces between individual elements.
    11. Control of a system is exercised by the element with the greatest variety of behavioral responses.
    12. Loose systems last longer and work better.
    13. Complex systems exhibit complex and unexpected behaviors.
    14. Colossal systems foster colossal errors.
    -KISS
    • by Anonymous Coward on Tuesday June 07, 2005 @07:07PM (#12752849)
      ...of a wikipedia text [wikipedia.org]. (You didn't follow the terms of the GNU Free Documentation License.)
      • by mcguyver (589810) on Tuesday June 07, 2005 @07:57PM (#12753236) Homepage
        I never thought the day would come when someone posts a joke [answers.com] and the respone, on /. of all places, references copyright restrictions. How ironic, if not a sad sign of how times have changed.
        • Whether or not someone's copyright has been infringed, it is plagiarism.
        • The GNU Free Documentation License is not about restrictions, but about giving the viewers of your post the same rights that Wikipedia gave you, thereby allowing anyone to continue to spread your "joke."
        • by jesterzog (189797) on Wednesday June 08, 2005 @12:10AM (#12754802) Homepage Journal

          I never thought the day would come when someone posts a joke and the respone, on /. of all places, references copyright restrictions. How ironic, if not a sad sign of how times have changed.

          I don't see why it's ironic. As uninformed as some slashdot posts are, there are also a lot of users who recognise that copyright makes a lot of sense, and is actually useful. It's the enforcement of copyright that allows the GPL and the GFDL to work. What many people here do complain about is the never-ending extentions of copyright, arguably against the general public interest, and allegedly because corporations have bought off politicians.

          This may be a joke, but it was copied verbatim without providing the copyright notice, which is required [wikipedia.org] by the GNU Free Documentation Licence. It's a copyright violation, and to ignore it as irrelevant would be hypocritical and ironic in itself. (Not to mention illegal.)

    • Designing a system to not fail means that it will fail in a way that you didn't plan for.

      Twice in the last year, for example, both my primary and backup systems have failed within a week of each other.
    • 15. Colossal lists make me lose interest in parent quickly, despite "Informative" mod.
    • In other words, "ultra-reliable" was just a marketing term to support whatever bonuses were paid out that year. Anything "ultra-reliable" is probably too simple to use in the case where people care about ultra-reliability. A spoon is ultra-reliable and I've never suffered a spoon failure ... but I can't fix a car transmission with a spoon. My spoons could become 100 times more unreliable, and I'd barely even notice. They're too simple to even care about.
  • Big deal... (Score:4, Funny)

    by Anonymous Coward on Tuesday June 07, 2005 @07:00PM (#12752768)
    It's called "leaving early from work".....everyone does it.
  • Details? (Score:3, Insightful)

    by CrackHappy (625183) on Tuesday June 07, 2005 @07:00PM (#12752772) Journal
    I haven't got the time or I would look myself - does anyone have any more informative sources on the specific information about the cause of the problem? And WTF is a "Message Storm"? God - another catchphrase - great!
    • Re:Details? (Score:5, Informative)

      by afidel (530433) on Tuesday June 07, 2005 @07:21PM (#12752958)
      A "message storm" is a storm of data that overwhelms a system, kind of like a DDOS, but legitimate traffic. In this case it sounds like a large number of error messages overwhelmed the message queueing system (probably MQ from IBM), which likely set off an even larger storm of error messages when backed up messages started to expire.
      • Did you notice the ticker going "PR0N... V1AGRA... DIPLOMAS... PR0N... V1AGRA..."?

      • I've never understood the corprorate programmers' obsession with purchasing message queue systems like MQSeries. You can design and code something that does the 90% that you actually need of what MQSeries does yourself in a week tops. MQSeries is so universal and overly complex for what most people want out of it, which is just reliable transactional networked message queueing with the options of in-order delivery, multiple queuers and dequeuers, and disk persistence.
        • well of course ... when you put it that way.

          shoot, I'll just go and whip one up before bedtime.

      • something to keep in mind about systems like these is that they are basically handling intentional DDOS attacks all day. what i mean is, picture thousands of traders all clicking away at a computer screen at the exact same millisecond (kind of like a slashdot effect), frequently sending multiple orders each. this often happens because traders listen to the same news and trade off the same events. so the exchange's systems are designed to handle major DDOSing all day. and they are required to handle the requ
    • The term "Message Storm" has been around for a long time. I used to work on an event management system and we had to carefully manage our reporting rate to avoid this problem.

      In a lot of cases a "catastrophic" failure will trigger a whole series of alarms. If a power supply goes, you can get a heat warning, then a voltage warning, then a backup activation warning, which shut down non-critical systems, which generate heartbeat failure messages, etc. So a single event generates multiple, (sometimes thousa
    • Re:Details? (Score:3, Insightful)

      by bigberk (547360)
      I figure we're not getting the entire story. Remember, the NASDAQ had a "mysterious glitch" within the past few weeks as well (quotes off by multiples). Two of the best run, most important stock exchanges in the world suffering unusual and silly sounding errors?
  • They had to hire outside *nix coders when the in-house MS crew couldn't integrate the existing WinLAN into the (unsupported, shortsighted) Linux rollout last month.
  • What is the System? (Score:3, Interesting)

    by putko (753330) on Tuesday June 07, 2005 @07:05PM (#12752827) Homepage Journal
    It sounds like a distributed systems failure, alright.

    Here [nysedata.com] is something about the system that might have broken. I'm wondering if the thing that failed really is the thing mentioned here -- the stuff the stuff Birman [simc-inc.org] did. His new book on distributed systems is out [amazon.com], by the way.

    Somone will get flying ninja-kicked in the nuts for this, you can be sure.
  • That doesn't mean it will never get dirty.
  • by Chairboy (88841) on Tuesday June 07, 2005 @07:25PM (#12752991) Homepage
    JoeTrader: dood, chk out MSFT, 12m volume
    XyxyZ: wtf i sold on margin
    -- NASDUCK has entered the channel
    JoeTrader: rofl!
    NASDUCK: whatsup?
    JoeTrader: sam sold msft on margin before the spike
    NASDUCK: HAHAHA!
    JoeTrader: werd
    XyxyZ: screw you guys
    JoeTrader: OMG roflrofldolololo!!!!!
    NASDUCK: you are such a tool, sam
    JoeTrader: brb, gotta tell the office
    -SYSTEM- JoeTrader has left the channel (sam in a tool)
    -SYSTEM-:NASDUCK has changed the subject to "XyxyZ sold MSFT before the spike today!!!:D:D:D"
    XyxyZ: fu duck. i hope my boss isn't online
    XyxyZ: ops
    XyxyZ: +ops
    -SYSTEM- Hot2Trade has joined the channel
    NASDUCK: nice try, only way to erase that is to crash the server
    Hot2Trade: Sam, I heard that you got the horns of the bull shoved up where the bear don't shine
    XyxyZ: dude this sux hard
    -SYSTEM- JOHN@MLYNCH has joined the channel
    NASDUCK: nice one Hot2Trade. asl?
    Hot2Trade: fu hippy, this is Jerry in at prudential
    NASDUCK: fuc sorry, didn't recognize you :O
    XyxyZ: So if I can down the server, I can erase the subject?
    Hot2Trade: no worries I just changed my nic
    NASDUCK: XyxyZ, you got pwned by the bull
    JOHN@MLYNCH: SAM! HAHAHA I TOLDYOU NOT TO SELL!
    JOHN@MLYNCH: YOU AER
    JOHN@MLYNCH: SUCH A SP
    XyxyZ: i got s cript
    JOHN@MLYNCH: AZZZ!!!!!!!!!!!!!!!!!!!
    XyxyZ: take this bitches
    XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
    XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
    XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
    XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
    XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
    - SYSTEM - NASDUCK (quit(connection reset by peer))
    - SYSTEM - JOHN@MLYNCH (quit(connection reset by peer))
    - SYSTEM - Hot2Trade (quite(connection reset by peer))
    - SYSTEM - error(91) - rebooting
  • pain in the ass (Score:5, Informative)

    by rcamera (517595) on Tuesday June 07, 2005 @07:30PM (#12753033) Homepage
    as a trading engine developer/support guy for a financial firm in ny, i can't stress enough what a pain in the ass this was. the day after the nyse crash, it took hours upon hours of verifying (by hand) trades that the nyse says we were filled on that we never say (because all nyse trading lines were down).

    this type of 'message flood' occurs from time to time, but not on the nyse in a while. it's generally the ecms trading otc stocks that have rouge programs blast orders in an infinite loop. when this happens to an ecm, they slow down but generally don't lose the ability to trade. the nyse, who toutes the importance of their rapists^H^H^H^H^H^H^Hspecialists because they add 'stability' to the system, was dead in the water. this crash goes to show how useless the specialists really are - without the technology working, they can do nothing. if this is the case, why not just replace them altogether with electronic trade matching?

    interestingly enough, the nyse announced mere months ago that they are 'merging' with archipelago - a large ecm. perhaps this merger will be the beginning of the end of the specialists.
  • Reuters? (Score:2, Interesting)

    by ajs (35943)
    "the main system and its backup were swamped with error messages, Reuters reports"

    Which is kinda funny, since it was *probably* a reuters feed [reuters.com] that was spewing the errors in the first place....
  • At least this didn't happen at the begining of the trading day, the very last thing we need is strain on the economy...
    • What are you talking about?? I was watching Fox news and they said the economy is doing great! Oh wait, that was in early November.
  • by synx (29979)
    I wonder if NYSE uses tibco rendevouz for their message transport "bus". My work uses this software, and our usage of it has stressed it to extremes and you can end up with message storm issues.

    FYI this system is a multicast-based publish-subscribe system. The multicast thing tends to be a wash IMHO, especially since many people use it for queues, rather than true 1 to many messaging.
    • I worked at tibco, but not in the messaging side of engineering - it's only a smallish part of what they do. Here's what I remember, modulo time and lack of coffee:

      - NASDAQ uses tibrv (formerly Rendezvous). This featured a lot in tibco's pitch about reliability/scalability etc. NYSE, I *think*, uses MQ Series.

      - Rendezvous used to only run over IP broadcast, with software "router" daemons for crossing subnets. This has not been required for years now - remember, the history of Rendezvous stretches bac
      • by synx (29979)
        Ah yes, we've seen the multicast group table problem. Basically you run into it and your system instantly crashes and suffers huge difficult to recover problems. I wasn't directly involved, but we have put in place migration paths away from tibco. Of course we were doing several things wrong:

        - Dont have HUGE subject names - significantly reduces performance I have been told.
        - dont use multicast for point to point RPC-like services.

        Of course its intensely attractive since you can just run a program, not
    • Tibco's implementation definitely leaves a lot to be desired. Their success has always came from their ties with Reuters (who used to own a huge stake in the company) and thus their use in high profile environments. Its never been because of their technology.
  • This was actually most likely the result of a multicast or "slow consumer" storm. In a multicast network environment, often desktops are overloaded by all of the filtering they must perform (multicast sends nearly everything to everybody). Sometimes some desktops will miss a packet and ask for a retransmission. Often, this involves retransmitting in multicast-form - that is to all of the consumers. If this happens too many times, you get a storm. No matter what the NYSE does (unless they buy our technology
  • by ManDude (231569)
    "The outage stemmed from a fault in a system designed to distribute market data and operate computer trading systems."

    I think they are using TIBCO for their data messaging bus.

    . . . the reliability of a network described as "ultra reliable" . . .

    The use of the word network doesn't seem to fit. They won't be calling cisco.

    The Dude

  • Revenge of the Dick! (Score:2, Interesting)

    by Joe Jarvis (712473)
    Somewhere, in a secret underground lair wallpapered with 100 dollar bills, Dick Grasso is laughing maniacally.
  • by Ececheira (86172) on Tuesday June 07, 2005 @08:05PM (#12753291)
    I work in Technology for a Wall Street firm (you've heard of them). Stuff like this happens all the time -- systems go down and are usually back up pretty quickly, some route to some exchange will bounce for a few min. This time it was worse in that it affected NYSE and not one of the smaller exchanges at the end of the trading day. If you look at any graph showing trading volumes, the last few minutes of trading are always the heaviest.

    99.9% of the time, things bounce back very quickly and with the exception of a few internal emails, nobody cares, things go on.
    • This is true... I (now) work at a company located in the CBOT, and it seems there's more of a "errors are more okay" attitude. Which is interesting, when you consider that when an exchange connection (or even a connection for just one major contract) goes down, millions of dollars are at stake.

      This contrasts with my previous experience at a struggling airline company, where such failures were tagged with a price tag and flagged as "never, ever do this again". :-) Even if they weren't struggling, though, t

    • Stuff like this happens all the time... 99.9% of the time, things bounce back very quickly

      Boy are you guys going to have a fun time when there is a real market "event", not just a random glitch but a systemic problem. Those "discontinuities" look scary on graphs.
  • NYSE Chief Executive John Thain said that both the main system and its backup were swamped with error messages, Reuters reports. He added that the exchange would carry out remedial work designed to prevent any repetition of the problem.

    No, the remedial work is designed to cull out less adaptive problems, thus preparing the digital ecosystem for the emergence of tougher problems.

    -kgj
  • Your NYSE/OS C: drive appears to be getting full. You have approximately 4% disk space free. NYSE/OS recommends having at least 15% free disk space to properly conduct Big Buck Stock Exchanges.

    NYSE/OS has searched your drive for under-used and overly-large files and generated the following report.

    Under-used
    ----------
    c:\tickerlogs\SUNW.log 942 bytes

    Over-sized
    ----------
    c:\logs\NYSEerrors.log 543 Gb

  • I seem to remember the repetition of error messages as common in these high reliability systems.

    I think the same sort if thing happened with the US power grid shutdown?

    Time to put in something to slow down the maximum number of error messages.

FORTRAN is a good example of a language which is easier to parse using ad hoc techniques. -- D. Gries [What's good about it? Ed.]

Working...