World's Five Biggest SANs 161
An anonymous reader writes "ByteandSwitch is searching the World's Biggest SANs, and has compiled a list of 5 candidate with networks supports 10+ Petabytes of active storage. Leading the list is JPMorgan Chase, which uses a mix of IBM and Sun equipment to deliver 14 Pbytes for 170k employees. Also on the list are the U.S. DoD, which uses 700 Fibre Channel switches, NASA, the San Diego Supercomputer Center (it's got 18 Pbytes of tape! storage), and Lawrence Livermore."
... That we know about (Score:5, Insightful)
Re: (Score:1, Informative)
Re:... That we know about (Score:5, Funny)
Re:... That we know about (Score:5, Funny)
Re: (Score:3, Interesting)
It is interesting when you get in storage of this size. I remember sitting in a meeting where we discussed a storage cabinet we were ordering. The RAW size of the cabinet was 150TB but formatted it would be 100TB
Re: (Score:3, Informative)
Not so accurate (Score:4, Informative)
Details? (Score:4, Funny)
Re: (Score:2)
So to save your reputation you'll have to spill your beans, so to speak.
Re:Not so accurate (Score:5, Insightful)
Exactly why shouldn't you have end-users plugged into a SAN? I run a SAN, and I find that diskless workstations PXE booting off gigabit iSCSI storage are a huge improvement to having local disk. For more or less exactly those reasons; performance, redundancy, flexibility, growth and sharing. Not to mention data consolidation and savings in less wasted local storage.
I suspect the idea that SAN's are for servers is mostly spread by overcharging SAN vendors who dont want their profit margins eroded by inexpensive consumer devices. In fact, I'd say consumer storage is rapidly progressing beyond the server side and is these days the main driver behind storage expansion; I certainly know my home storage needs expands faster than the vast majority of the servers I admin (yes, there are the we-want-to-simulate-the-atoms-in-the-ocean exceptions, but most business application servers use less storage than you can get in an mp3 player).
Re:Not so accurate (Score:5, Insightful)
Not everyone needs a SAN for storage but using a SAN is a very sound decision for those that need the capabilites it provides. A SAN is not just a buzz word although I do not doubt some people bought them without understanding what they were getting and why.
Re:Not so accurate (Score:5, Funny)
I sense the little counter at NSA/"homeland security" click up -- Internet chatter about possible attack just increased! Few more like that, and terror alert will go up!! Geez people, watch what you type!
NSA SAN (Score:2)
Re: (Score:3, Funny)
Re: (Score:3, Insightful)
Yes they do.
I am migrating the our support call, issue tracking, and RMA data base to a new server. We take a good number of calls a year and have almost six years of data on the server. The dump file is only 16 megabytes. Most business data is still text and text just doesn't eat up that much space.
For home use doesn't and workstations does NAS make more sense than SAN? I am on a small network so we only use NAS for
Re: (Score:2)
Yes and no. Shared storage such as home directories, shared files, etc, are better on a NAS.
The advantage of SAN connections for the desktop lies in the OS and paging space area, for speed critical applications, and more in the area of maintenance and support than in convenience. Getting rid of the local disk gives you the ability to do things like migrate clients to new op
Re:Not so accurate (Score:5, Insightful)
Go to a law firm and ask them about their document management systems or their litigation support applications. Go to a bank and ask them about their financial records. What about email archives for compliance? Size up the disk space utilization and I think you will see many application servers that are significantly larger in storage than an MP3 player. Point taken, SANs can be used at the desktop level. But I partially wonder why? Wouldn't it be better to synchronize users' data folders with shares on a server that is diskless to the SAN? Why waste all that 'spensive storage just to make workstations diskless? Unless you are using a Compellant SAN or some SAN that is running a deduplication engine on the fly, you're stuck storing an OS install for each workstation.
Besides this, I've always felt that the big advantage of a SAN is the ability to replicate an entire environment to another site in case of disaster. SANs are really utilized to the max in enterprise environments where these features are necessary for successful business operations.
Re: (Score:2)
Good examples of business uses, but still comparatively small (depending on company size). Compare with PVR storage, mpeg files, etc.
There are, of course, larger storage needs for some applications, particularly in large companies, but I see more and more cases of formerly huge applications where I had a terabyte storage ten years ago and the disk arrays were awe inspiring (and cost a fortune). Now the same systems will have a terabyte and a half storage, and there
Re: (Score:2)
Good call on the overall disk space savings. I wasn't really thinking about the unneeded hard disk space on the client boxes. It is definitely an interesting potential configuration... especially if you could pipe a deduplication engine in front of the disk array. Granted you would take a performance hit, but you could save a great deal of space.
As far as the overall decrease of disk necessity in the enterprise, I'm not so sure I agree. Our email environment is about 3TB local with another 5. - 1.5TB
Re:Not so accurate (Score:4, Insightful)
It's more, inaccurate, or maybe a result of shallow researching, or at the very least simplified.
Re:Not so accurate (Score:5, Funny)
Very U.S. Centric... (Score:5, Insightful)
If I remember correctly, these guys will generate petabytes of data per day when that monster particle accelerator goes online in a few months...
Re:Very U.S. Centric... (Score:5, Funny)
Let's hope CERN's data can be zipped... (Score:5, Funny)
Remember when you got your first copy of Napster and ADSL? That's how serious...!
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:3, Interesting)
Now asuming the phone calls are made over POTS, the bitrate from the sensors should be...20 * 6*10^9 * 1220bps...
Re: (Score:2, Informative)
Still, the data aquisition and storage system is impressive. Most of the storage will be distributed over diffe
Re:Very U.S. Centric... (Score:5, Informative)
I'll talk about one of the experiments, ATLAS. Yes we "generate" petabytes of data per day. It's rather easy to calculate actually. One collision in the detector can be compressed down to about 2MB raw data-- after lots of zero-suppression and smart-storage of bits from a detector that has ~100 million channels worth of readout information.
There are ~30 million collisions a second -- as the LHC machine runs are 40Mhz but has a "gap" in its beam structure.
Multiplying: 2 * 10^6 * 30 * 10^6 = 6* 10^13 Bytes per second. So ATLAS "produces" 1 petabyte of information in about 13 seconds!! :)
But ATLAS is limited to being able to store about ~300 MB per second. This is the limit coming from how fast you can store things. Remember, there are 4 LHC experiments after all, and ATLAS gets its fair share of store capability.
Which means that about of 30 million collisions per second, ATLAS can only store 150 collisions per second.... which it turns out is just fine!! The *interesting* physics only happens **very** rarely -- due to the nature of *weak* interactions. At the LHC, we are no-longer interested in the atom falling apart, and spitting its guts (quarks and gluons out). We are interested in rare processes such as dark-matter candidates or Higgs, or top-top production (which will dominate the 150Hz btw) and interesting and rare things. In most of the 30 million collisions, the protons spit their guts out and much much *rare* things occur. The catch of the trigger of ATLAS (and any other LHC experiment for that matter) is to find those *interesting* 150 events out of 30 million every second -- and do this in real time, and without a glitch. ATLAS uses about ~2000 computers to do this real-time data reduction and processing... CMS uses more, I believe.
In the end, we get 300 MB/second worth of raw data and that's stored on tape at Tier 0 at CERN permanently -- and until the end of time as far as anyone is concerned. That data will never *ever* be removed. Actually the 5 Tier 1 sites will also have a full-copy of the data among themselves.
Which brings me to my point that CERN storage is technically not a SAN (Storage Area Network)... (My IT buddies are insisting on this one. ) I am told that CERN storage counts as a NAS (Network Attached Storage). But I am going to alert them to this thread and will let them elaborate on that one!
Re: (Score:3, Informative)
Can't imagine why you write this as AC, but ok...
Answer: "That's only 300MB/s 24/7 for more than half a year for writing the raw data to storage. Then there are the other three experiments with the same amount of data, actually one of them does 1.2GB/s of raw data. The data ends up on disk first with an aggregate write speed of ~1.5GB/s (let's not exaggerate). The data is read immediately from disk again to be written to tape (our final storage media), so ~1.5GB/s reads ... Then, all this data is being ex
Re: (Score:2)
Re: (Score:2)
CASTOR's (the CERN data store) current stats are here:
http://castor.web.cern.ch/castor/ [web.cern.ch]
About 8 PB of files. If i recall correctly, there's around 500TB of online disk space and 10-30PB of tape storage (some of it is getting phased out).
FNAL has a similar setup, except with a storage manager called dCache. There is no use of protocols like iSCSI or Fiber Channel over
Re:Very U.S. Centric... (Score:4, Insightful)
"We at Byte and Switch are on the trail of the world's biggest SAN, and this article reveals our initial findings."
and this
"Again, this list is meant to be a starting place for further endeavor. If you've got a big SAN story to tell, we'd love to hear it."
oh, and this too
"we present five of the world's biggest SANs:"
notice how everything in TFA clearly says this is not THE 5 BIGGEST SAN's in the world but the 5 largest they have found SO FAR.
I know -- I must be new here, but I'm getting there. I didn't read the whole article, just a few sentences from the first page.
The big surprise is (Score:5, Funny)
Re: (Score:3, Funny)
Re: (Score:2)
14Pb for 170k employees... (Score:5, Insightful)
If you add up the total disk space in an average office you'll get more than that. If I add up all my external disks, etc. I've got more than a terabyte on my desktop.
(And yes it's true, data does grow to fit the available space)
Re: (Score:3, Informative)
I'm also curious about Google and the like. Do they not disclose their storage?
I didn't say the disks were full... (Score:2)
Does the whole 14Pb go in a single room? That might be impressive.
Re: (Score:2)
Re: (Score:3, Informative)
To a certain extend they have disclosed some numbers in a paper about their distributed storage system called "BigTable". The title of the paper is "Bigtable: A Distributed Storage System for Structured Data" and it can be found right here [google.com].
Some numbers can be found on page 11:
Project and Table size in TB:
Crawl: 800
Crawl: 50
Google Analytics: 20
Google Analytics: 200 (Raw click table)
Google Base: 2
Google Earth: 0.5
G
Re: (Score:3, Funny)
Re: (Score:1)
there's much more than that, but GFS is no SAN at all. google can do better than that, and does.
GFS is all about cheap storage, lots of it. and yes, 14 P is basically nothing, in google terms.
what article really is about is "who wasted more money on over-priced enterprisey SAN crap".
Re: (Score:3, Funny)
14Pb for 170k employees isn't so much - 83 gigabytes per person. If you add up the total disk space in an average office you'll get more than that. If I add up all my external disks, etc. I've got more than a terabyte on my desktop.
You'd find a lot of the 83GB on a typical office PC is crap you're not going to put in a SAN, my boot drive without data has 50GB used and other than the pain in the arse of re-installing I couldn't give a toss if I lost all that "data". Yes I've got a TB of storage too but subtract p0rn, DVDs and other contents that would get me sacked if I worked in a corporate environment, subtract the large amount of reference material (that would be shared between users in a corporate environment) and all my original
Re: (Score:1, Insightful)
Re: (Score:2)
Bad way of putting it. (Score:1, Redundant)
Each employee is NOT getting 83Gb of space on the SAN. They might get a few Gb for email. That space is used to store accounts, general business stuff, personal information, credit reports, market information, simulations etc primarily for data mining. Then of course it's replicated to several locations.
Re:14Pb for 170k employees... (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Grandparent is correct.
At Last! (Score:5, Funny)
Shouldn't this be written somewhere? (Score:5, Informative)
Re:Shouldn't this be written somewhere? (Score:5, Funny)
Re: (Score:3, Informative)
Re: (Score:2)
...and why does the article say "Pbytes", "Tbytes" (Score:3, Informative)
The abbreviated units are "PB" and "TB".
See: http://en.wikipedia.org/wiki/Petabyte [wikipedia.org]
Re:...and why does the article say "Pbytes", "Tbyt (Score:5, Insightful)
To avoid missunderstandings, 4 additionals bytes (B) dont seem that much of a price.
Re: (Score:2)
Sorry, had to say it... but are you really sure that "ytes" is beings represented on my machine in bytes? My browser might actually be translating into a unicode representation (or some other storage) that uses at least 2 Bytes per character. =-)
Re:...and why does the article say "Pbytes", "Tbyt (Score:2)
That's nothing... (Score:4, Funny)
Re: (Score:1, Funny)
Re: (Score:2)
Sooo... (Score:2, Funny)
Bugger.
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re:Sooo... (Score:4, Funny)
Just SANs... so what? (Score:5, Interesting)
Large Hadron Collider (CERN) will need 15 PB/yr (Score:2)
It's planned to go running next spring, but it's already at 3.5 PB. (The old LEP collider working in the same tunnel produced quite a bit of data too).
They forgot about ... (Score:3, Interesting)
Re: (Score:1)
security, resilience, risk, etc (Score:1)
Surely there are other important considera
Re: (Score:2)
Do tell. I'm curious.
Re: (Score:1)
Re: (Score:3, Informative)
You also talk about copies of data as if a disk went bad, you'd lose the data. These storage arrays have multiple redundancies (RAIDs of VDisks which are RAIDs themselves) as well as having live replication capability to remote sites -- at which point you likely have a copy (or copies) of an entire datacenter in a different geographic location that is running as a hot spare.
Within a datacenter, you would not have more than dual fabrics.
Re: (Score:2)
Re: (Score:2)
But you're just picking and choosing and making up strawmen now. If you have three fabrics, and one goes down because you just botched the upgrade, and the other goes down because a circuit breaker blows, well, now you're back to one fabric and vulnerable again. You're also totally flipping insane if you're worried about this, because the power company can
Re: (Score:2)
SAN islands can be dual fabric, as well as single fabric. I'm not sure you're grokking my posts.
Re: (Score:2)
Re: (Score:2)
Fabric merges and other configuration commands can have their potential disruptions limited by using zoning and VSANs. But no one can help you if you decide to log on and reboot the switch.
And yes I do know about those multi-vendor switch fabrics. While no one vendor seems to follow the standards the sam
Internal Revenue Service (Score:2)
A few years ago, I remember reading an article about the IRS (the government tax division) that had seven or eight regional data centers around the U.S. -- each with many petabytes of storage to store current and historical tax data on hundreds of millions of Americans, corporations, etc. I can't find the article now but it seems like *it* should make the list... maybe even top the list.
Discount Web host with scalable SAN (Score:4, Funny)
with an unlikely name that offers a scalable,
distributable SAN, called an HDSAN
(High Density Storage Area Network),
for its customers:
This site is unbelievable ... (Score:2)
heeeeellllooooo 1990! We're back!
Largest DISCLOSED SANs (Score:2)
Re: (Score:2)
Is your firm Goldman Sachs as per your resume on your website?
Not since they let me go in 1998
Re: (Score:2)
know who you are and what firm you're talking about. If you pre-declared your PHP variables or turned off notices in php.ini you'd cut your disk use by 2/3rds.
I've never used PHP personally in my life.
5 biggest sans? (Score:2)
What I want to know (Score:3, Interesting)
Re: (Score:2)
I suspect in at least some of these cases, the SAN is not continuous, and they're actually doing backups to the SAN. Part of the storage mentioned will be the online portion of a hierarchical storage manager or similar.
Re: (Score:2)
Re: (Score:2)
I used to work at the NASA facility mentioned.... (Score:2)
Pronunciation is the key (Score:5, Funny)
We're talking about Petabytes, not Pedobytes.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
I think we need to coin a few new terms (Score:3, Funny)
Pr0ntab: A score, equal to the amount of time in tenths of a minute, that elapses from the moment a news article is posted to the first comment relating said article to a person's porn collection or viewing habits.
Pr0ntible: The statistical likelihood that any given article will have a low Pr0ntab score, where 1.0 is the highest score, and 0 the lowest.
Pr0ntabulary: A time sensitive, categorical table of subject matter, where each category is assigned a Pr0ntible, and said table is organized in descend
Re: (Score:2, Informative)