Forgot your password?
typodupeerror
Data Storage IT Hardware

World's Five Biggest SANs 161

Posted by CowboyNeal
from the still-need-more-drivespace dept.
An anonymous reader writes "ByteandSwitch is searching the World's Biggest SANs, and has compiled a list of 5 candidate with networks supports 10+ Petabytes of active storage. Leading the list is JPMorgan Chase, which uses a mix of IBM and Sun equipment to deliver 14 Pbytes for 170k employees. Also on the list are the U.S. DoD, which uses 700 Fibre Channel switches, NASA, the San Diego Supercomputer Center (it's got 18 Pbytes of tape! storage), and Lawrence Livermore."
This discussion has been archived. No new comments can be posted.

World's Five Biggest SANs

Comments Filter:
  • by Anonymous Coward on Friday September 21, 2007 @03:29AM (#20693431)
    What about Google, Amazon, Yahoo, Microsoft, etc.?
    • Re: (Score:1, Informative)

      by Anonymous Coward
      Google doesn't use conventional SAN architectures, so they probably wouldn't qualify for this list.
    • by Anonymous Coward on Friday September 21, 2007 @07:08AM (#20694343)
      And there wasn't even a single Japanese firm listed. You'd think they'd have the biggest SANs of all.
    • by Chapter80 (926879) on Friday September 21, 2007 @08:31AM (#20694817)
      I was thinking San Diego, San Francisco, San Antonio, San Jose, Santa Claus.
    • Re: (Score:3, Interesting)

      by GoodOmens (904827)
      I know it's not a PB but here at the Census we have one 150TB array thats used for one project and not 170K employees.

      It is interesting when you get in storage of this size. I remember sitting in a meeting where we discussed a storage cabinet we were ordering. The RAW size of the cabinet was 150TB but formatted it would be 100TB ... 50TB is a lot of storage to "throw away" for redundancy / formatting! Considering at this price your paying about $10k+ a TB (With staff and infrastructure costs fractured in
      • Re: (Score:3, Informative)

        by WuphonsReach (684551)
        $10k per Terabyte isn't all that bad; maybe about 30-50% higher. Server level storage goes for $4-$8 per GB (so $4k to $8k per Terabyte). It may also depend on when that SAN was put into use. Were they able to use less expensive SATA drives, or did they need the raw performance of SCSI, etc. Plus cost per gigabyte slowly decreases over time (not as fast as it used to, but it's still a gradual decline of maybe 25% per year).

  • Not so accurate (Score:4, Informative)

    by cymru_slam (881440) <welsh@sean.gmail@com> on Friday September 21, 2007 @03:29AM (#20693433)
    I work for one of the organisations listed and I have to say that what they described sounds NOTHING like our infrastructure :-)
    • Details? (Score:4, Funny)

      by clarkkent09 (1104833) on Friday September 21, 2007 @04:51AM (#20693815)
      Ah, go on tell us. We won't tell anybody
    • Not to imply that you don't know what you're talking about, but isn't part of the point of a SAN; to an end-user it practically looks and behaves as if it were a local device ? Of course, you might be in the know of the overall storage infrastructure, but then again... you might not.

      So to save your reputation you'll have to spill your beans, so to speak.
  • by Noryungi (70322) on Friday September 21, 2007 @03:34AM (#20693463) Homepage Journal
    Yes, I know, US web site and everything but, seriously, have you checked the data storage of CERN (birth place of the web) lately?

    If I remember correctly, these guys will generate petabytes of data per day when that monster particle accelerator goes online in a few months...
    • by barry_the_bogan (976779) on Friday September 21, 2007 @03:51AM (#20693557)
      They're talking about the "world", as defined by the World Series Baseball people. Lame story.
    • by Joce640k (829181) on Friday September 21, 2007 @04:17AM (#20693691) Homepage
      Let's hope CERN's data can be zipped...if not, they'll be in trouble pretty quickly.

      Remember when you got your first copy of Napster and ADSL? That's how serious...!

    • Re: (Score:3, Informative)

      by palpatin (94174)
      Well, don't know about the storage capacity, but the LHC will produce around 15 petabytes per year [web.cern.ch], when they turn it on.
    • If I remember correctly, these guys will generate petabytes of data per day
      Rechecking your facts would be in order. Some 10 to 15 petabytes are expected to be saved per year according to sources I have seen, though only a small fraction of the raw sensor data will be permanently recorded.
      • Re: (Score:3, Interesting)

        by TapeCutter (624760)
        IIRC they need a massive cache where the "sampling algorithm" throws a heap of data away. A quick google gives the following precise measure - "The LHC will generate data at a rate equivalent to every person on Earth making twenty phone calls at the same time." - but as you say it only stores a fraction of that.

        Now asuming the phone calls are made over POTS, the bitrate from the sensors should be...20 * 6*10^9 * 1220bps...
        • Re: (Score:2, Informative)

          by torako (532270)
          Most of the caching is done using custom hardware that lives right in the detectors with latency in the order of s. The data output for permanent storage is (for the ATLAS detector, that's the one I know some stuff about) 200 MBytes / s, which is not gigantic. There are some PDF slides of a seminar talk on the triggering mechanism for ATLAS on my homepage in the Seminar section (in English even).

          Still, the data aquisition and storage system is impressive. Most of the storage will be distributed over diffe

      • by perturbed1 (1086477) on Friday September 21, 2007 @08:44AM (#20694925)

        I'll talk about one of the experiments, ATLAS. Yes we "generate" petabytes of data per day. It's rather easy to calculate actually. One collision in the detector can be compressed down to about 2MB raw data-- after lots of zero-suppression and smart-storage of bits from a detector that has ~100 million channels worth of readout information.

        There are ~30 million collisions a second -- as the LHC machine runs are 40Mhz but has a "gap" in its beam structure.

        Multiplying: 2 * 10^6 * 30 * 10^6 = 6* 10^13 Bytes per second. So ATLAS "produces" 1 petabyte of information in about 13 seconds!! :)

        But ATLAS is limited to being able to store about ~300 MB per second. This is the limit coming from how fast you can store things. Remember, there are 4 LHC experiments after all, and ATLAS gets its fair share of store capability.

        Which means that about of 30 million collisions per second, ATLAS can only store 150 collisions per second.... which it turns out is just fine!! The *interesting* physics only happens **very** rarely -- due to the nature of *weak* interactions. At the LHC, we are no-longer interested in the atom falling apart, and spitting its guts (quarks and gluons out). We are interested in rare processes such as dark-matter candidates or Higgs, or top-top production (which will dominate the 150Hz btw) and interesting and rare things. In most of the 30 million collisions, the protons spit their guts out and much much *rare* things occur. The catch of the trigger of ATLAS (and any other LHC experiment for that matter) is to find those *interesting* 150 events out of 30 million every second -- and do this in real time, and without a glitch. ATLAS uses about ~2000 computers to do this real-time data reduction and processing... CMS uses more, I believe.

        In the end, we get 300 MB/second worth of raw data and that's stored on tape at Tier 0 at CERN permanently -- and until the end of time as far as anyone is concerned. That data will never *ever* be removed. Actually the 5 Tier 1 sites will also have a full-copy of the data among themselves.

        Which brings me to my point that CERN storage is technically not a SAN (Storage Area Network)... (My IT buddies are insisting on this one. ) I am told that CERN storage counts as a NAS (Network Attached Storage). But I am going to alert them to this thread and will let them elaborate on that one!

    • You would have been real upset before San Francisco was taken off the list.
    • by bockelboy (824282)
      The CMS detector will take data at 8GB/s at turn on (that's gigabytes, not gigabits). This will be filtered and a few percent will be saved.

      CASTOR's (the CERN data store) current stats are here:

      http://castor.web.cern.ch/castor/ [web.cern.ch]

      About 8 PB of files. If i recall correctly, there's around 500TB of online disk space and 10-30PB of tape storage (some of it is getting phased out).

      FNAL has a similar setup, except with a storage manager called dCache. There is no use of protocols like iSCSI or Fiber Channel over
    • by bushki3 (1025263) on Friday September 21, 2007 @09:41AM (#20695477)
      from TFA

      "We at Byte and Switch are on the trail of the world's biggest SAN, and this article reveals our initial findings."

      and this

      "Again, this list is meant to be a starting place for further endeavor. If you've got a big SAN story to tell, we'd love to hear it."

      oh, and this too

      "we present five of the world's biggest SANs:"

      notice how everything in TFA clearly says this is not THE 5 BIGGEST SAN's in the world but the 5 largest they have found SO FAR.

      I know -- I must be new here, but I'm getting there. I didn't read the whole article, just a few sentences from the first page.
  • by Centurix (249778) <centurix@NoSPam.gmail.com> on Friday September 21, 2007 @03:35AM (#20693469) Homepage
    that all the disks are formatted FAT32...
  • by Joce640k (829181) on Friday September 21, 2007 @03:44AM (#20693517) Homepage
    14Pb for 170k employees isn't so much - 83 gigabytes per person.

    If you add up the total disk space in an average office you'll get more than that. If I add up all my external disks, etc. I've got more than a terabyte on my desktop.

    (And yes it's true, data does grow to fit the available space)
    • Re: (Score:3, Informative)

      by Enderandrew (866215)
      When I generate ghost images for the PCs here at work, the average desktop user goes through about 4 gigs here, if that. 83 gigs per person is quite a bit.

      I'm also curious about Google and the like. Do they not disclose their storage?
      • I didn't say the disks were full, just that the storage available per person in the average office is more than that.

        Does the whole 14Pb go in a single room? That might be impressive.

      • Re: (Score:3, Informative)

        by StarfishOne (756076)
        "I'm also curious about Google and the like. Do they not disclose their storage?"

        To a certain extend they have disclosed some numbers in a paper about their distributed storage system called "BigTable". The title of the paper is "Bigtable: A Distributed Storage System for Structured Data" and it can be found right here [google.com].

        Some numbers can be found on page 11:
        Project and Table size in TB:

        Crawl: 800
        Crawl: 50
        Google Analytics: 20
        Google Analytics: 200 (Raw click table)
        Google Base: 2
        Google Earth: 0.5
        G

        • Re: (Score:3, Funny)

          by somersault (912633)
          I hope they filter all those clicks before they dump them in the landfill. Can you imagine the mess 200 terabytes of raw clicks would make?
      • by asserted (818761)
        > I'm also curious about Google and the like. Do they not disclose their storage?

        there's much more than that, but GFS is no SAN at all. google can do better than that, and does.
        GFS is all about cheap storage, lots of it. and yes, 14 P is basically nothing, in google terms.
        what article really is about is "who wasted more money on over-priced enterprisey SAN crap".
    • Re: (Score:3, Funny)

      by commlinx (1068272)

      14Pb for 170k employees isn't so much - 83 gigabytes per person. If you add up the total disk space in an average office you'll get more than that. If I add up all my external disks, etc. I've got more than a terabyte on my desktop.

      You'd find a lot of the 83GB on a typical office PC is crap you're not going to put in a SAN, my boot drive without data has 50GB used and other than the pain in the arse of re-installing I couldn't give a toss if I lost all that "data". Yes I've got a TB of storage too but subtract p0rn, DVDs and other contents that would get me sacked if I worked in a corporate environment, subtract the large amount of reference material (that would be shared between users in a corporate environment) and all my original

    • Re: (Score:1, Insightful)

      1Gb of natilie portman and hot grits should be enough for anyone
    • 14Pb for 170k employees isn't so much - 83 gigabytes per person.
      Sorry, this is completely naive. It's a misunderstanding of what an average is.

      Each employee is NOT getting 83Gb of space on the SAN. They might get a few Gb for email. That space is used to store accounts, general business stuff, personal information, credit reports, market information, simulations etc primarily for data mining. Then of course it's replicated to several locations.
       
  • At Last! (Score:5, Funny)

    by Zymergy (803632) on Friday September 21, 2007 @03:46AM (#20693529)
    Someone can install a FULL install of Windows Vista!
  • by rm999 (775449) on Friday September 21, 2007 @03:49AM (#20693545)
    SAN = Storage area network
  • by Joce640k (829181) on Friday September 21, 2007 @03:49AM (#20693549) Homepage
    ...and why does the article say "Pbytes", "Tbytes", etc.

    The abbreviated units are "PB" and "TB".

    See: http://en.wikipedia.org/wiki/Petabyte [wikipedia.org]

  • by Jah-Wren Ryel (80510) on Friday September 21, 2007 @03:50AM (#20693551)

    "ByteandSwitch is searching the World's Biggest SANs, and has compiled a list of 5 candidate with networks supports 10+ Petabytes of active storage.
    What? That's nothing. I've got 100 petabytes just for my pr0n collection!
    • Re: (Score:1, Funny)

      by Anonymous Coward
      yeah I've had a look. How you managed to find so much midget pr0n is beyond me...
    • by Lxy (80823)
      Wow... that's a lot of petafiles *rimshot*
  • Sooo... (Score:2, Funny)

    by Tastecicles (1153671)
    My home entertainment server at 3.3TB RAID6 isn't even in the running then?

    Bugger.
    • haha, I remember it was only a few short years ago I was hot shit because my personal file server had 500 gig of storage in it. most storage in a single computer of anyone I knew. I had a gig of ram and one of the fancy p4 processors with twice the cache of all the wussy ones. man was I cool. of course now I am older and don't have nearly the discretionary computer money I used to, and I am still running that exact same hardware having wet dreams about joining the 64bit real dual core revolution.
      • by EmagGeek (574360)
        I am running way behind! My home video server only has 1920GB of storage via 8x320GB drives in a Raid-5... but I know what you mean. In college, I put four 1.6GB drives in my box, and did a Linux Stripe to 6.4GB, and that was hot shit back then. We stored lots of CDs in that new-fangled MP3 format on there.
    • You mean your porn server?
    • Re:Sooo... (Score:4, Funny)

      by Chineseyes (691744) on Friday September 21, 2007 @11:20AM (#20696767)
      Unfortunately your "entertainment server" is not among the winners for biggest SANs but it IS in the running for "copyright offender of the year". Our lawyers will be contacting you with your prize soon. Sincerely, Steven Marks Executive Vice President and General Counsel, RIAA
  • by Duncan3 (10537) on Friday September 21, 2007 @04:21AM (#20693709) Homepage
    Kinda like saying the worlds fastestest runner that likes swiss cheese best. This isn't a list of fastest, largest, most used, etc. Just just some PR spin for SANs. Nothing wrong with that, but still.

  • At least according to this article [techtarget.com].

    It's planned to go running next spring, but it's already at 3.5 PB. (The old LEP collider working in the same tunnel produced quite a bit of data too).
  • by chris_sawtell (10326) on Friday September 21, 2007 @04:52AM (#20693819) Journal
    Google [google.com], the WayBack Machine [google.com], to say nothing of the 1.5 million machine bot-net [informationweek.com] we've been hearing about recently.
  • From my experience most FTSE 100 companies in the UK have multi-petabytes of storage so I'm assuming that the article is referring to a single consolidated SAN and not disparate SAN islands. Although it is interesting to examine the limits of scalability for such an environment on theorectical grounds, a more interesting question would be to understand the reasons why organisations would want to consolidate such vast quantities of data within a single SAN system.

    Surely there are other important considera
    • Zoning, Virtual SANs (VSANs), and Inter-VSAN Routing (IVR) solve that problem. And why would you need more than a dual fabric setup? Do you like buying expensive HBAs for your servers? Just what kind of hostile environment are you deploying into?

      Do tell. I'm curious.
      • by kefa (640985)
        These are all logical methods of isolation and do not enable you to escape the the impact of physical infrastructure changes. Suppose you need to replace your SAN infrastructure to upgrade - do you really want your entire infrastructure to be dependent on a single fabric while you are carrying out your changes? Likewise I would never want to be dependent on only two copies of your data. If you are working on one copy do you really only want to be protected by a single remaining copy. I wouldn't necessari
        • Re: (Score:3, Informative)

          by statemachine (840641)
          And to that I ask: How paranoid are you, and how much money do you have?

          You also talk about copies of data as if a disk went bad, you'd lose the data. These storage arrays have multiple redundancies (RAIDs of VDisks which are RAIDs themselves) as well as having live replication capability to remote sites -- at which point you likely have a copy (or copies) of an entire datacenter in a different geographic location that is running as a hot spare.

          Within a datacenter, you would not have more than dual fabrics.
          • by Bandman (86149)
            Aside from working in a facility with a pre-existing setup like this, how does one go about educating themselves on how SANs of this magnitude work?
        • by afidel (530433)
          Huh? Unless you have unwisely maxed out your switches before planning the upgrade you would simply interconnect the new physical switches into the fabric through ISL's and then move host over to the new distribution layer switches. You would be vulnerable for the length of time it takes to move a cable from one distribution switch to the new one and only for that single host which is most likely a member of a cluster. As far as the two copies comment, no you would likely have very frequent snapshots within
  • A few years ago, I remember reading an article about the IRS (the government tax division) that had seven or eight regional data centers around the U.S. -- each with many petabytes of storage to store current and historical tax data on hundreds of millions of Americans, corporations, etc. I can't find the article now but it seems like *it* should make the list... maybe even top the list.

  • by rjamestaylor (117847) <rjamestaylor@gmail.com> on Friday September 21, 2007 @07:52AM (#20694557) Journal
    Recently I found a discount online web hosting company
    with an unlikely name that offers a scalable,
    distributable SAN, called an HDSAN
    (High Density Storage Area Network),
    for its customers:

    SlumLordHosting.com [slumlordhosting.com]
  • I have no idea how much disk space my firm has, but I did hear an apocryphal tale of installing multiple truckfuls of disks every week pretty much indefinitely (now, of course, older smaller disks are also being removed, but even if it's one for one the service life of enterprise disks means the total is continuously growing). But the firm and the total space can't be disclosed. I'm not trying to make any claims - it could well be smaller than the five mentioned, but the point is nobody knows. I'm sure lots
  • What I want to know (Score:3, Interesting)

    by teslatug (543527) on Friday September 21, 2007 @10:18AM (#20695919)
    How do they do backups (especially online ones) and restores?
    • by demi (17616) *

      I suspect in at least some of these cases, the SAN is not continuous, and they're actually doing backups to the SAN. Part of the storage mentioned will be the online portion of a hierarchical storage manager or similar.

    • by Phishcast (673016)
      There are many ways to do online backups of large amounts of storage. One common way is with storage based snapshots. You quiesce your application (i.e. put your database into hot backup mode), take your logical snapshot, and present that snapshot to a backup server. To the backup server, it looks like you're backing up local disk, but in actuality you're getting a point in time copy of your application/database.

    • You can also be doing scheduled replication at the disk array level - from array to array, and back up from the secondary storage. Also, you have dedicated network interfaces for any backup to tape that isn't going exclusively over the storage network. Finally, there are some really massive tape libraries with robot loaders (that move quickly enough to kill anyone inside the unit when it comes on - detecting debris / obstacles becomes an important task). There are tape library systems that can support numbe
  • Back in 1998, the amount of storage they had was pretty impressive too. It took rooms and rooms to do it, but if I'm remembering this correctly they had about 5 terabytes of disk online and close to 5 petabytes of tape robots. It was a pretty slick automated system- everyone had accounts on a main fileserver, and files that had not been accessed in a certain amount of time were written to tape. If you were to try to grab a file that was on tape, it would fire up the robot, transfer it off the tape, and g

"Mach was the greatest intellectual fraud in the last ten years." "What about X?" "I said `intellectual'." ;login, 9/1990

Working...