World's Five Biggest SANs

World's Five Biggest SANs 161

Posted by CowboyNeal on Friday September 21, 2007 @03:16AM from the still-need-more-drivespace dept.

An anonymous reader writes "ByteandSwitch is searching the World's Biggest SANs, and has compiled a list of 5 candidate with networks supports 10+ Petabytes of active storage. Leading the list is JPMorgan Chase, which uses a mix of IBM and Sun equipment to deliver 14 Pbytes for 170k employees. Also on the list are the U.S. DoD, which uses 700 Fibre Channel switches, NASA, the San Diego Supercomputer Center (it's got 18 Pbytes of tape! storage), and Lawrence Livermore."

World's Five Biggest SANs

This discussion has been archived. No new comments can be posted.

Search 161 Comments Log In/Create an Account

Comments Filter:

Not so accurate (Score:4, Informative)

by cymru_slam ( 881440 ) writes: <welsh.sean@nOsPaM.gmail.com> on Friday September 21, 2007 @03:29AM (#20693433)

I work for one of the organisations listed and I have to say that what they described sounds NOTHING like our infrastructure :-)

Shouldn't this be written somewhere? (Score:5, Informative)

by rm999 ( 775449 ) writes: on Friday September 21, 2007 @03:49AM (#20693545)

SAN = Storage area network

...and why does the article say "Pbytes", "Tbytes" (Score:3, Informative)

by Joce640k ( 829181 ) writes: on Friday September 21, 2007 @03:49AM (#20693549) Homepage

...and why does the article say "Pbytes", "Tbytes", etc.

The abbreviated units are "PB" and "TB".

See: http://en.wikipedia.org/wiki/Petabyte [wikipedia.org]

Re:14Pb for 170k employees... (Score:3, Informative)

by Enderandrew ( 866215 ) writes: <enderandrew&gmail,com> on Friday September 21, 2007 @03:51AM (#20693559) Homepage Journal

When I generate ghost images for the PCs here at work, the average desktop user goes through about 4 gigs here, if that. 83 gigs per person is quite a bit.

I'm also curious about Google and the like. Do they not disclose their storage?

Re:Shouldn't this be written somewhere? (Score:3, Informative)

by Eideewt ( 603267 ) writes: on Friday September 21, 2007 @03:53AM (#20693575)

Why thank you. I was trying to figure out what a Storage Attached Network might be.

Re:Very U.S. Centric... (Score:3, Informative)

by palpatin ( 94174 ) writes: on Friday September 21, 2007 @04:18AM (#20693697)

Well, don't know about the storage capacity, but the LHC will produce around 15 petabytes per year [web.cern.ch], when they turn it on.

Re:14Pb for 170k employees... (Score:3, Informative)

by StarfishOne ( 756076 ) writes: on Friday September 21, 2007 @04:43AM (#20693775)

"I'm also curious about Google and the like. Do they not disclose their storage?"

To a certain extend they have disclosed some numbers in a paper about their distributed storage system called "BigTable". The title of the paper is "Bigtable: A Distributed Storage System for Structured Data" and it can be found right here [google.com].
Some numbers can be found on page 11:
Project and Table size in TB:

Crawl: 800
Crawl: 50
Google Analytics: 20
Google Analytics: 200 (Raw click table)
Google Base: 2
Google Earth: 0.5
Google Earth: 70
Orkut: 9
Personalized Search: 4

Total so far: 1,155.5 TB
It's a very interesting paper to read. One of the many papers [google.com] Google has put online:

Re:security, resilience, risk, etc (Score:3, Informative)

by statemachine ( 840641 ) writes: on Friday September 21, 2007 @06:40AM (#20694237)

And to that I ask: How paranoid are you, and how much money do you have?

You also talk about copies of data as if a disk went bad, you'd lose the data. These storage arrays have multiple redundancies (RAIDs of VDisks which are RAIDs themselves) as well as having live replication capability to remote sites -- at which point you likely have a copy (or copies) of an entire datacenter in a different geographic location that is running as a hot spare.

Within a datacenter, you would not have more than dual fabrics. Your fabrics' switches will also be redundantly connected within themselves. And if you're killing an entire fabric with an upgrade, you're doing it wrong.

You'll also have service contracts with lockers of disks, switches, linecards, etc., *on site* with field technicians from the vendors on-call 24/7.

Fibre Channel installations are not like some small company's Ethernet LAN.

Re:... That we know about (Score:1, Informative)

by Anonymous Coward writes: on Friday September 21, 2007 @07:06AM (#20694331)

Google doesn't use conventional SAN architectures, so they probably wouldn't qualify for this list.

Re:14Pb for 170k employees... (Score:4, Informative)

by fellip_nectar ( 777092 ) writes: on Friday September 21, 2007 @07:30AM (#20694417)

No he didn't. [google.co.uk]

Re:Very U.S. Centric... (Score:2, Informative)

by torako ( 532270 ) writes: on Friday September 21, 2007 @07:46AM (#20694517) Homepage

Most of the caching is done using custom hardware that lives right in the detectors with latency in the order of s. The data output for permanent storage is (for the ATLAS detector, that's the one I know some stuff about) 200 MBytes / s, which is not gigantic. There are some PDF slides of a seminar talk on the triggering mechanism for ATLAS on my homepage in the Seminar section (in English even).
Still, the data aquisition and storage system is impressive. Most of the storage will be distributed over different sites, so I don't know if there will be a huge central storage system.

Re:Very U.S. Centric... (Score:5, Informative)

by perturbed1 ( 1086477 ) writes: on Friday September 21, 2007 @08:44AM (#20694925)

I'll talk about one of the experiments, ATLAS. Yes we "generate" petabytes of data per day. It's rather easy to calculate actually. One collision in the detector can be compressed down to about 2MB raw data-- after lots of zero-suppression and smart-storage of bits from a detector that has ~100 million channels worth of readout information.
There are ~30 million collisions a second -- as the LHC machine runs are 40Mhz but has a "gap" in its beam structure.
Multiplying: 2 * 10^6 * 30 * 10^6 = 6* 10^13 Bytes per second. So ATLAS "produces" 1 petabyte of information in about 13 seconds!! :)
But ATLAS is limited to being able to store about ~300 MB per second. This is the limit coming from how fast you can store things. Remember, there are 4 LHC experiments after all, and ATLAS gets its fair share of store capability.
Which means that about of 30 million collisions per second, ATLAS can only store 150 collisions per second.... which it turns out is just fine!! The *interesting* physics only happens **very** rarely -- due to the nature of *weak* interactions. At the LHC, we are no-longer interested in the atom falling apart, and spitting its guts (quarks and gluons out). We are interested in rare processes such as dark-matter candidates or Higgs, or top-top production (which will dominate the 150Hz btw) and interesting and rare things. In most of the 30 million collisions, the protons spit their guts out and much much *rare* things occur. The catch of the trigger of ATLAS (and any other LHC experiment for that matter) is to find those *interesting* 150 events out of 30 million every second -- and do this in real time, and without a glitch. ATLAS uses about ~2000 computers to do this real-time data reduction and processing... CMS uses more, I believe.
In the end, we get 300 MB/second worth of raw data and that's stored on tape at Tier 0 at CERN permanently -- and until the end of time as far as anyone is concerned. That data will never *ever* be removed. Actually the 5 Tier 1 sites will also have a full-copy of the data among themselves.
Which brings me to my point that CERN storage is technically not a SAN (Storage Area Network)... (My IT buddies are insisting on this one. ) I am told that CERN storage counts as a NAS (Network Attached Storage). But I am going to alert them to this thread and will let them elaborate on that one!

Re:Pebibytes (Score:2, Informative)

by guruevi ( 827432 ) writes: on Friday September 21, 2007 @12:02PM (#20697515)

I've never heard of anything expressed in Pebi, Tebi or Gibi nor Mebi. A Petabyte is still 1024 x 1 Terabyte which is 1024 x 1 Gigabyte which is 1024 x 1 Megabyte which is 1024 x 1 Kilobyte which is 1024 x 1 Byte which is 8 bit. As soon as you have a 10-bit based computer, you can express your stuff in *bibytes

Re:... That we know about (Score:3, Informative)

by WuphonsReach ( 684551 ) writes: on Friday September 21, 2007 @01:17PM (#20698757)

$10k per Terabyte isn't all that bad; maybe about 30-50% higher. Server level storage goes for $4-$8 per GB (so $4k to $8k per Terabyte). It may also depend on when that SAN was put into use. Were they able to use less expensive SATA drives, or did they need the raw performance of SCSI, etc. Plus cost per gigabyte slowly decreases over time (not as fast as it used to, but it's still a gradual decline of maybe 25% per year).

Re:Very U.S. Centric... (Score:3, Informative)

by perturbed1 ( 1086477 ) writes: on Saturday September 22, 2007 @06:51AM (#20709519)

Can't imagine why you write this as AC, but ok...
Answer: "That's only 300MB/s 24/7 for more than half a year for writing the raw data to storage. Then there are the other three experiments with the same amount of data, actually one of them does 1.2GB/s of raw data. The data ends up on disk first with an aggregate write speed of ~1.5GB/s (let's not exaggerate). The data is read immediately from disk again to be written to tape (our final storage media), so ~1.5GB/s reads ... Then, all this data is being exported to external computing centers pretty much immediately too (multiple copies etc. etc., so aggregate is much higher than 1.5GB/s), so we get ~3GB/s of reads just from this data export (it can, potentially, be a lot more. we have already a total of 120Gbit/s of network connectivity to those sites). So, we are already at ~6GB/s of I/O and nobody even had a look into the data itself!! If we talk data analysis, we talk about repeated reprocessing runs over the entire collection of raw data in order to "create" the data format that physicists can more easily use for their analysis, we talk about several thousand people accessing all the accumulated data in a perfectly random way ... mind you, we keep all the raw data active, so 10 years there will be at least 100PB, probably more like 150PB, maybe even 200PB of active storage. The current estimate for the I/O caused by the data analysis is in the order of 50GB/s (big B). "

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

World's Five Biggest SANs 161

World's Five Biggest SANs More Login

World's Five Biggest SANs

Not so accurate (Score:4, Informative)

Shouldn't this be written somewhere? (Score:5, Informative)

...and why does the article say "Pbytes", "Tbytes" (Score:3, Informative)

Re:14Pb for 170k employees... (Score:3, Informative)

Re:Shouldn't this be written somewhere? (Score:3, Informative)

Re:Very U.S. Centric... (Score:3, Informative)

Re:14Pb for 170k employees... (Score:3, Informative)

Re:security, resilience, risk, etc (Score:3, Informative)

Re:... That we know about (Score:1, Informative)

Re:14Pb for 170k employees... (Score:4, Informative)

Re:Very U.S. Centric... (Score:2, Informative)

Re:Very U.S. Centric... (Score:5, Informative)

Re:Pebibytes (Score:2, Informative)

Re:... That we know about (Score:3, Informative)

Re:Very U.S. Centric... (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot