Stories
Slash Boxes
Comments

News for nerds, stuff that matters

ZFS, the Last Word in File Systems?

Posted by CmdrTaco on Thu Sep 16, 2004 11:06 AM
from the no-point-discussing-I-guess dept.
guigouz writes "Sun is carrying a feature story about its new ZFS File System - ZFS, the dynamic new file system in Sun's Solaris 10 Operating System (Solaris OS), will make you forget everything you thought you knew about file systems. ZFS will be available on all Solaris 10 OS-supported platforms, and all existing applications will run with it. Moreover, ZFS complements Sun's storage management portfolio, including the Sun StorEdge QFS software, which is ideal for sharing business data."
This discussion has been archived. No new comments can be posted.
ZFS, the Last Word in File Systems? | Log In/Create an Account | Top | 564 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • billion billion? (Score:5, Funny)

    by michael path (94586) * on Thursday September 16 2004, @11:07AM (#10267136)
    (http://www.indeterminism.org/ | Last Journal: Wednesday May 05 2004, @10:46AM)
    From the article:

    Unlimited scalability
    As the world's first 128-bit file system, ZFS offers 16 billion billion times the capacity of 32- or 64-bit systems.

    Microsoft immediately countered by saying WinFS [microsoft.com] will now support "twelveteen million billion times" as much storage as Sun's ZFS, and is "a bazillion times" more secure.

    When reached for comment, Sun CEO Scott McNealy [sun.com] replied "neener neener". Microsoft CEO Steve Ballmer [microsoft.com] responded by putting gum in Sun President Jonathan Schwartz [sun.com]'s hair.
  • Open source (Score:4, Informative)

    by Splinton (528692) * on Thursday September 16 2004, @11:07AM (#10267137)
    (http://www.traveljury.com/)

    And it looks like it's going to be opensourced along with most of Solaris 10!

    Presumably a 32 bit machine will be able to handle a 128 bit file system, in the same way as Solaris 10 is currently destined for (at most) 64 bits.

    • Re:Open source by i621148 (Score:2) Thursday September 16 2004, @11:15AM
      • Re:Open source (Score:4, Insightful)

        by tolan-b (230077) on Thursday September 16 2004, @11:18AM (#10267288)
        I suspect that whatever open source license Sun release Solaris under, they'll be careful to make sure it's incompatible with the GPL.
        [ Parent ]
        • Re:Open source by dbIII (Score:2) Thursday September 16 2004, @09:33PM
          • Re:Open source by tolan-b (Score:1) Sunday September 19 2004, @10:44AM
      • Re:Open source by TheHonestTruth (Score:1) Thursday September 16 2004, @11:21AM
      • 1 reply beneath your current threshold.
    • Re:Open source (Score:5, Interesting)

      by balster neb (645686) on Thursday September 16 2004, @11:18AM (#10267283)
      Yes, it does look like it would be open-sourced as part of Solaris 10 (it was mentioned as one of the major new features).

      Assuming the Solaris 10 will be true open source (not like Microsoft's "shared source"), as well as GPL compatibile, would I be able to use ZFS on my GNU/Linux desktop? Will ZFS be a viable alternative to ext3 and ReiserFS? Or is the overhead too big?
      [ Parent ]
      • Re:Open source by Anonymous Coward (Score:1) Thursday September 16 2004, @12:44PM
      • Re:Open source by VitaminB52 (Score:1) Friday September 17 2004, @10:05AM
    • Re:Open source (Score:5, Funny)

      by CrkHead (27176) on Thursday September 16 2004, @11:22AM (#10267348)
      It looks like Microsoft may have its new WinFS after all...
      [ Parent ]
    • Re:Open source by lamber45 (Score:1) Thursday September 16 2004, @11:27AM
      • Re:Open source by Anonymous Writer (Score:2) Thursday September 16 2004, @12:19PM
    • Ah, but... by abb3w (Score:2) Thursday September 16 2004, @11:33AM
    • Patents and other Bad Signs. (Score:5, Interesting)

      by twitter (104583) on Thursday September 16 2004, @11:37AM (#10267540)
      (http://lists.clickers.org/linuxsig/index.html | Last Journal: Friday November 23, @08:40PM)
      Opensource is useless when it's patent encumbered. While it's nice that the details will be available, it sucks to think that I can't use them except to serve Sun for the next 17 years. Such disclosure, of course, is what the patent system is supposed to provide but does not. What the patent is providing is ownership of ideas. How obvious those ideas are and if there's prior art is impossible to say from the linked puff piece.

      This article is shocking. I'm used to much less hype and far more technical details from Sun. Software patents and bullshit are not what I expect when I follow a link to them.

      I don't like any of this.

      [ Parent ]
      • Re:Patents and other Bad Signs. by LurkerXXX (Score:1) Thursday September 16 2004, @11:56AM
      • Re:Patents and other Bad Signs. (Score:4, Interesting)

        by Anonymous Writer (746272) on Thursday September 16 2004, @12:32PM (#10268260)

        Opensource is useless when it's patent encumbered.

        The GPL [gnu.org] states the following...

        Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.

        I thought that if the patent holder distributes patented material under the GPL, it is a declaration that the holder has relinquished control over the patented material for as long as it is applied under the GPL.

        [ Parent ]
      • 1 reply beneath your current threshold.
    • Re:Open source (Score:5, Interesting)

      by GileadGreene (539584) on Thursday September 16 2004, @11:50AM (#10267708)
      (http://www.google.com/search?q=gilead+greene)
      From the article:
      More important, ZFS is endian-neutral. You can easily move disks from a SPARC server to an x86 server. Neither architecture pays a byte-swapping tax due to Sun's
      patent-pending "adaptive endian-ness" technology, which is unique to ZFS.[emphasis mine]
      So while it might be open-sourced, you're not likely to see it migrating to Linux or the BSDs any time soon.
      [ Parent ]
      • Re:Open source by Minwee (Score:2) Thursday September 16 2004, @12:22PM
        • Re:Open source by Cryptnotic (Score:2) Thursday September 16 2004, @06:04PM
      • 1 reply beneath your current threshold.
    • Re:Open source by Sipos (Score:1) Thursday September 16 2004, @02:31PM
    • 2 replies beneath your current threshold.
  • Out of letters. (Score:3, Funny)

    by Kenja (541830) on Thursday September 16 2004, @11:08AM (#10267142)
    Of course ZFS is the last word in file systems. I mean, what can come after zed?
  • Two things... (Score:5, Insightful)

    by rincebrain (776480) on Thursday September 16 2004, @11:08AM (#10267148)
    1) Even Sun has succumbed to recursive acronyms, now.

    2) Is it just me, or is the post surprisingly bereft of unique details? I mean, integration with all existing applications is rather assumed, given that it's a file system and all...
    • Re:Two things... (Score:5, Funny)

      by InadequateCamel (515839) on Thursday September 16 2004, @11:34AM (#10267504)
      I especially liked:
      "Neither architecture pays a byte-swapping tax due to Sun's patent-pending "adaptive endian-ness" technology"

      Adaptive endian-ness? What a stupid thing to include in a press release...there has to be a better way to say that.

      Just announced by Sun:
      "ANMF, our new file system (Ambiguous Nomenclature FS) will be filled with file cataloguing technology stuff that allows faster-ish operations that result in application goodness".
      [ Parent ]
    • Re:Two things... by Florian Weimer (Score:2) Thursday September 16 2004, @12:13PM
    • Re:Two things... by null etc. (Score:1) Thursday September 16 2004, @12:27PM
    • Re:Two things... by Roadkills-R-Us (Score:3) Thursday September 16 2004, @01:34PM
    • A third one ... by DarkMan (Score:2) Thursday September 16 2004, @05:51PM
    • Re:Two things... by viktor (Score:3) Friday September 17 2004, @06:08AM
    • 1 reply beneath your current threshold.
  • Hmf. (Score:5, Insightful)

    by BJH (11355) on Thursday September 16 2004, @11:09AM (#10267155)
    Logically, the next question is if ZFS' 128 bits is enough. According to Bonwick, it has to be. "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."

    So, what was the point of creating a 128-bit filesystem?

    -1, Marketing Hype.

    *Yawn*
    • Re:Hmf. (Score:5, Informative)

      by Kenja (541830) on Thursday September 16 2004, @11:10AM (#10267168)
      "So, what was the point of creating a 128-bit filesystem?

      Getting rid of file/drive size limitations for the foreseeable future?

      [ Parent ]
      • 64 bits is awfully big already (Score:5, Informative)

        by pslam (97660) on Thursday September 16 2004, @11:30AM (#10267456)
        (Last Journal: Tuesday September 23 2003, @03:15PM)
        Getting rid of file/drive size limitations for the foreseeable future?

        It would take over 500 years to fill a 64 bit filesystem written at 1GB/sec (and of course 500 years to read it back again). 64 bits is already an impossibly large figure. There's absolutely nothing special or clever whatsoever about doubling the size of your pointers aside from using up more disk space for all the metadata.

        64 bits is enough for today's filesystems in much the same way that 256 bit AES is enough for today's encryption - there are far bigger things that will require complete system changes than that so called "limit". I suspect a better filesystem will come along well before those 500 years are up... I agree with grandparent:

        -1, Marketing Hype.

        [ Parent ]
        • Re:64 bits is awfully big already by T3kno (Score:2) Thursday September 16 2004, @11:38AM
          • Re:64 bits is awfully big already (Score:5, Insightful)

            by pslam (97660) on Thursday September 16 2004, @11:54AM (#10267764)
            (Last Journal: Tuesday September 23 2003, @03:15PM)
            Yeah, its probably marketing hype now, but in 5 years, what about 10? Just because we can't do it now doesn't mean that we should stop progress.

            No, precisely because we can't do it now, and for the very predictable future, we shouldn't be wasting all that disk space, access and CPU time for a boundary that no production system is likely to ever reach before they get upgraded. That's just practicality.

            Seagate apparently sold 18.3 million desktop drives last year. Assuming they're all about 120GB (which is generous of me), that would be about 17.6*10^18 bits. Guess what, that's 2^64 bits. Yes, you would have to buy every single desktop hard drive Seagate shipped in the last year to have the capacity to fill a 64 bit filesystem. And find space for 18 million drives. And a power station to deliver the several hundred megawatts you'd need.

            Even at 2 times drive capacity growth per year that's still a ridiculously unattainable figure. In 14 years time you'd only need to buy 1000 drives (which are now 2000TB each). But 14 years is a geological time scale when it comes to computers. You'd have wasted 14 years of CPU time and disk space devoted to those extra 64 bits.

            If you still think 64 bits isn't enough, how about 96 bits? It would take 46 years before hard disks were big and cheap enough so you could fill the filesystem by buying 1000 of them. But no, they chose 128 bits because it sounded good.

            [ Parent ]
        • Re:64 bits is awfully big already (Score:5, Insightful)

          by Jeff DeMaagd (2015) on Thursday September 16 2004, @11:47AM (#10267677)
          (http://www.demaagd.com/ | Last Journal: Sunday October 27 2002, @06:53PM)
          It would take over 500 years to fill a 64 bit filesystem written at 1GB/sec (and of course 500 years to read it back again).

          One product already can transfer a Terrabyte per second, so that would cut the transfer down to half a year. And I imagine that transfer rate would continue to increase.

          I don't see how one would necessarily argue against such a thing for products that will go for cluster and supercomputer use. I say might as well get the bugs out so when you can so that once the 65th bit is needed, the supercomputer suppliers are ready.

          http://www.sc-conference.org/sc2004/storcloud.ht ml
          [ Parent ]
        • Re:64 bits is awfully big already (Score:4, Interesting)

          by Too Much Noise (755847) on Thursday September 16 2004, @11:49AM (#10267695)
          (Last Journal: Tuesday May 18 2004, @11:49PM)
          The funny thing is, until the time an 128-bit FS will really be needed any patents Sun has on ZFS will have expired. So whatever that day's Open Source OS of choice will be, it will at least support ZFS (and probably that time's 128-bit incarnation of several of today's FS's).

          Somehow, an alternate history where 80286 was 64-bit instead of 16-bit (while everything else staying the same) comes to mind when reading the Sun's marketing on this.
          [ Parent ]
        • Re:64 bits is awfully big already by Geisel (Score:1) Thursday September 16 2004, @12:46PM
        • Re:64 bits is awfully big already (Score:4, Interesting)

          by laird (2705) <lairdp@ELIOTgmail.com minus poet> on Thursday September 16 2004, @12:53PM (#10268543)
          (Last Journal: Monday April 07 2003, @07:39AM)
          "It would take over 500 years to fill a 64 bit filesystem written at 1GB/sec"

          This is about the same argument as IPv6 addressing: it's expensive to change the size of the address space, so make it absurdly large because bits of address space are cheap, you enable some interesting unforseen applications, and you put off a forced migration.

          While I agree that 128-bit block addressing is overkill for a single computer, once you're going to expand past a 64-bit filesystem, there's not much point in going smaller than a 128-bit fileystem. It's not like you'd save money making it an 80-bit filesystem.

          As to your point about the speed of a hard drive vs. the addressible space in the filesystem, keep in mind that filesystems are much larger than disks. For example, it's not that unusual (in cooler UNIX environments) for everyone in a company to work in one large distributed filesystem, which may run across hundreds or thousands of hard drives. Now imagine a building full of people working with very large files (e.g. video production) where you could easily accumulate terabytes of data. Wouldn't it be nice to manage your online, nearline, and offline storage as a system, extremely large filesystem? Or, for real blue-sky thinking, imagine that everyone on the planet uses a single shared, distributed filesystem for everything. Wouldn't it be cool to address _everything_ using a single, consistent scheme no matter where you are. Cool, eh?
          [ Parent ]
        • Re:64 bits is awfully big already by TrogL (Score:1) Thursday September 16 2004, @02:48PM
        • Re:64 bits is awfully big already by Black Art (Score:2) Thursday September 16 2004, @09:52PM
      • Re:Hmf. by georgewilliamherbert (Score:2) Thursday September 16 2004, @07:49PM
    • Re:Hmf. by grub (Score:2) Thursday September 16 2004, @11:11AM
    • Re:Hmf. by Gentoo Fan (Score:2) Thursday September 16 2004, @11:14AM
      • Re:Hmf. by gl4ss (Score:2) Thursday September 16 2004, @11:17AM
        • Re:Hmf. by Gentoo Fan (Score:2) Thursday September 16 2004, @11:21AM
          • Re:Hmf. by IWannaBeAnAC (Score:2) Thursday September 16 2004, @11:36AM
            • Re:Hmf. by Gentoo Fan (Score:3) Thursday September 16 2004, @11:41AM
          • Re:Hmf. by arose (Score:2) Thursday September 16 2004, @08:05PM
        • Re:Hmf. by jellomizer (Score:2) Thursday September 16 2004, @11:42AM
          • Re:Hmf. by gl4ss (Score:2) Thursday September 16 2004, @12:21PM
      • Re:Hmf. by macsuibhne (Score:1) Thursday September 16 2004, @01:13PM
    • Re:Hmf. (Score:5, Interesting)

      by elmegil (12001) on Thursday September 16 2004, @11:21AM (#10267335)
      (http://slashdot.org/ | Last Journal: Wednesday March 07 2007, @09:12PM)
      "You'll never need more than 640K of memory". The point would be to be ready as storage densities increase. In the last 8 years we've gone from a terabyte filling a room to a terabyte on a desktop, and I'm sure there are more density breakthroughs coming.

      It's your density, Luke.

      [ Parent ]
      • Re:Hmf. by BJH (Score:2) Thursday September 16 2004, @11:51AM
        • 1 reply beneath your current threshold.
      • Re:Hmf. by backslashdot (Score:2) Thursday September 16 2004, @12:04PM
        • Re:Hmf. by Anonymous Coward (Score:1) Thursday September 16 2004, @02:03PM
      • Re:Hmf. by pslam (Score:1) Thursday September 16 2004, @12:13PM
      • Re:Hmf. by julesh (Score:2) Thursday September 16 2004, @03:47PM
      • 1 reply beneath your current threshold.
    • Re:Hmf. by Nos. (Score:1) Thursday September 16 2004, @11:23AM
    • Re:Hmf. by sylvester (Score:2) Thursday September 16 2004, @11:30AM
    • Re:Hmf. by DjReagan (Score:2) Thursday September 16 2004, @11:35AM
      • Re:Hmf. by Minna Kirai (Score:1) Thursday September 16 2004, @03:10PM
    • Re:Hmf. by i_r_sensitive (Score:2) Thursday September 16 2004, @11:39AM
    • Re:Hmf. by Mik3D (Score:1) Thursday September 16 2004, @12:01PM
    • Re:Hmf. by Tenareth (Score:2) Thursday September 16 2004, @12:35PM
    • Re:Hmf. by PaSTE (Score:3) Thursday September 16 2004, @12:53PM
    • Re:Hmf. by Billly Gates (Score:2) Thursday September 16 2004, @03:57PM
      • 1 reply beneath your current threshold.
    • 3 replies beneath your current threshold.
  • Unlimited scalability (Score:3, Insightful)

    by Anonymous Coward on Thursday September 16 2004, @11:10AM (#10267162)
    Unlimited scalability
    As the world's first 128-bit file system, ZFS offers 16 billion billion times the capacity of 32- or 64-bit systems.
    But the last time I checked, 16 billion billion is still less than infinity.
  • Cool but.... (Score:3, Interesting)

    by otis wildflower (4889) on Thursday September 16 2004, @11:11AM (#10267179)
    ... it took them long enough.

    Perhaps they had to rewrite an LVM from scratch in order to opensource it?
  • Open Source by blahbooboo2 (Score:1) Thursday September 16 2004, @11:12AM
  • What is their disk allocation scheme? (Score:4, Informative)

    by grunt107 (739510) on Thursday September 16 2004, @11:12AM (#10267196)
    Having a global pool does lessen maintenance/support, but what method are they using to place data on the disks?

    Frequently accessed data needs to be spread out on all the disks for the fastest access, so does that mean Sun has FS files/tables that track usage and repositions data based on that?
    • Re:What is their disk allocation scheme? by Bobo_The_Boinger (Score:2) Thursday September 16 2004, @11:17AM
    • Re:What is their disk allocation scheme? by dan14807 (Score:1) Thursday September 16 2004, @12:24PM
    • by majid (306017) on Thursday September 16 2004, @12:30PM (#10268230)
      (http://www.majid.info/)
      I was in a chat session with their engineers yesterday. It looks like they have adaptive disk scheduling algorithms to balance the load across the drives (e.g. if a drive is faster than others, it will get correspondingly more I/O). The scheduler also tries to balance I/O among processes and filesystems sharing the data pool.

      This is a good thing - queueing theory shows a single unified pool has better performance than several smaller ones. People who try to tune databases by dedicating drives to redo logs don't usually realize what they are doing is counterproductive - they optimize locally for one area, at the expense of global throughput for the entire system.

      ZFS uses copy-on-write (a modified block is written wherever the disk head happens to be, not where the old one used to be). This means writes are sequential (as with all journaled filesystems) and also since the old block is still on disk (until it is garbage collected) this gives the ability to take snapshots, something that is vital for making coherent backups now that nightly maintenance windows are mostly history. This also leads to file fragmentation so enough RAM to have a good buffer cache helps.

      Because the scheduler works best if it has full visibility of every physical disk, rather than dealing with an abstract LUN on a hardware RAID, they actually recommend ZFS be hosted on a JBOD array (just a bunch of disks, no RAID) and have the RAID be done in software by ZFS. Since the RAID is integrated with the filesystem, they have the scope for optimizations that is not available if you have a filesystem trying to optimize on one side and a RAID controller (or separate LVM software) on the other side. Network Applicance does something like this with their WAFL network filesystem to offer decent performance despite the overhead of NFS.

      With modern, fast CPUs, software RAID can easily outperform hardware RAID. It is quite common for optimizations like hardware RAID made at a certain time to become counterproductive as technology advances and the assumptions behind the premature optimization are no longer valid. A long time ago, IBM offloaded some database access code in its mainframe disk controllers. It used to be a speed boost, but as the mainframe CPU speeds improved (and the feature was retained for backward compatibility), it ended up being 10 times slower than the alternative approach.
      [ Parent ]
  • There already is a ZFS. (Score:5, Informative)

    by TheLoneGundam (615596) on Thursday September 16 2004, @11:13AM (#10267203)
    (Last Journal: Friday May 16 2003, @01:55PM)
    IBM has ZFS on their z/OS Unix Systems Services (POSIX interfaces on z/OS) component. ZFS was developed to provide improvements over the HFS (Hierarchical File System) that they ship with the OS.
  • Provided computer applications have been exhausted by achesloc (Score:1) Thursday September 16 2004, @11:13AM
  • will it be open source by joshtimmons (Score:2) Thursday September 16 2004, @11:13AM
    • 1 reply beneath your current threshold.
  • UFS2/SU (Score:3, Interesting)

    by FullMetalAlchemist (811118) on Thursday September 16 2004, @11:15AM (#10267239)
    I'm really happy with UFS2/SU, and have been more than happy with the original UFS in general since 1994 when I first started off with NetBSD.
    But, with ZFS, maybe we finally have found a FS with replacing it with. I sure look forward to trying Solaris 10, though I'm sure that I will find that SunOS has a better feal to it, like always.

    Maybe DragonflyBSD will be the one to do this, FreeBSD is generally more restrictive to radical changes; for good reasons, you don't get that stability without reason.
  • You just got to love the headline... by Chainsaw (Score:2) Thursday September 16 2004, @11:15AM
  • If it's the last word (Score:3, Funny)

    by kick_in_the_eye (539123) on Thursday September 16 2004, @11:15AM (#10267248)
    (http://azp80.no-ip.info/)
    If it's the last word, why are we even talking about it?
  • by Ewan (5533) <ewan@nOspam.bcs.org> on Thursday September 16 2004, @11:16AM (#10267252)
    (http://ewan.to/ | Last Journal: Monday September 04 2006, @04:49PM)
    Reading the article, all I see is Sun saying how bad their old stuff was, e.g.:

    Consider this case: To create a pool, to create three file systems, and then to grow the pool--5 logical steps--5 simple ZFS commands are required, as opposed to 28 steps with a traditional file system and volume manager.

    and
    Moreover, these commands are all constant-time and complete in just a few seconds. Traditional file systems and volumes often take hours to configure. In the case above, ZFS reduces the time required to complete the tasks from 40 minutes to under 10 seconds.


    Compared to AIX or HP-UX, 28 steps is shockingly bad, both have had much simpler logical volume management for several versions now (AIX for 5 years or more? certainly as long as I have used it). The existing Solaris 9 logical volume infrastructure is years behind the competition, this is bringing it up to date, but not putting it far ahead.

    Ewan
  • What sort of crap is this? (Score:4, Interesting)

    by LowneWulf (210110) on Thursday September 16 2004, @11:16AM (#10267260)
    COME ON! It may be a slow day, but how is this news? There's only one link, and it's to Sun's marketing info.

    Can someone please provide a link to some technical details other than it being 128-bit? What does this file system actually do that is even remotely special? What's under the covers? And, more importantly, does it actually work as described?

    -1,Uninformative
  • That's a lot of storage (Score:5, Funny)

    by Gentoo Fan (643403) on Thursday September 16 2004, @11:17AM (#10267268)
    (http://www.gentoo.org/)
    But of course you'll still have to have your boot image within the first 1024 cylinders.
    • 1 reply beneath your current threshold.
  • Oh wow! (Score:3, Funny)

    Does this mean the absolutely awful Disksuite/Solaris Volume Manager is finally, mercifully, dead, too?

    I'll do a dance of utter joy if so. Disksuite is 10 pounds of shit in a 5 pound bag.

    - A.P.
    • Re:Oh wow! by elmegil (Score:2) Thursday September 16 2004, @11:24AM
      • Re:Oh wow! by Wakko Warner (Score:3) Thursday September 16 2004, @12:17PM
        • Re:Oh wow! by elmegil (Score:3) Thursday September 16 2004, @01:50PM
          • Re:Oh wow! by elmegil (Score:1) Thursday September 16 2004, @09:36PM
    • 2 replies beneath your current threshold.
  • Another quote to cherish (Score:4, Insightful)

    by AsciiNaut (630729) on Thursday September 16 2004, @11:18AM (#10267287)
    I broke the habit of a lunchtime and RTFA. According to Jeff Bonwick, the chief architect of ZFS, "populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."

    Who else instantly thought of, "640 K ought to be enough for anybody", uttered by the chief architect of twenty years of chaos?

    • Get your quotes right. by argent (Score:2) Thursday September 16 2004, @11:44AM
    • Re:Another quote to cherish by dragonp12 (Score:1) Thursday September 16 2004, @12:01PM
    • Re:Another quote to cherish by Shamashmuddamiq (Score:2) Thursday September 16 2004, @12:05PM
    • Re:Another quote to cherish (Score:5, Informative)

      by mdmarkus (522132) on Thursday September 16 2004, @12:23PM (#10268130)
      From Bruce Schneier in Applied Cryptography: Thermodynamic Limitations One of the consequences of the second law of thermodynamics is that a certain amount of energy is necessary to represent information. To record a single bit by changing the state of a system requires an amount of energy no less than kT where T is the absolute temperature of the system and k is the Boltzman constant. (Stick with me; the physics lesson is almost over.) Given that k = 1.38x10^-16 erg/Kelvin, and that the ambient temperature of the universe is 3.2K, an ideal computer running at 3.2K would consume 4.4x10^-16 ergs every time it set or cleared a bit. To run a computer any colder than the cosmic background radiation would require extra energy to run a heat pump. Now, the annual energy output of our sun is about 1.21x10^41 ergs. This is enough to power about 2.7x10^56 single bit changes on our ideal computer; enough changes to put a 187-bit counter through all of its values. If we built a Dyson sphere around the sun and captured all of its energy for 32 years, without any loss, we could power a computer to count up to 2^192. Of course it wouldn't have the energy left over to perform any useful calculations with this counter. But that's just one star, and a measly one at that. A typical supernova releases something like 10^51 ergs. (About a hundred times as much energy would be released in the form of neutrinos, but let them go for now.) If all of the energy could be channedel into a single orgy of computation, a 219-bit counter could be cycled through all of its states. These numbers have nothing to do with the technology of the devices; they are the maxiumums that thermodynamics will allow. And they strongly imply that brute-force attacks against 256-bit keys will be infeasible until computers are built from something other than matter and occupy something other than space.
      [ Parent ]
    • Re:Another quote to cherish by Dracolytch (Score:3) Thursday September 16 2004, @12:39PM
    • Re:Another quote to cherish by Dracolytch (Score:2) Thursday September 16 2004, @12:43PM
      • 1 reply beneath your current threshold.
    • Re:Another quote to cherish by zx75 (Score:2) Thursday September 16 2004, @01:23PM
    • 1 reply beneath your current threshold.
  • by perseguidor (777194) on Thursday September 16 2004, @11:18AM (#10267294)

    With traditional volumes, storage is fragmented and stranded. With ZFS' common storage pool, there are no partitions to manage. The combined I/O bandwidth of all of the devices in a storage pool is always available to each file system.


    Until now it does sound just like raid, but:


    When one copy is damaged, ZFS detects it via the checksum and uses another copy to repair it.

    No competing product can do this. Traditional mirrors can only handle total failure of a device. They don't have checksums, so they have no idea when a device returns bad data. So even though mirrors replicate data, they have no way to take advantage of it.


    I guess I just don't get it; I know they are talking about logical corruption and not a physical failure, but this is kind of like raid with somethink like SMART, or isn't it?

    And what kinds of corruption can there be? Journaling filesystems already work well for write errors and such, or so I thought.

    I know the architecture seems innovative and different (at least for me), but is there really new functionality?

    Sorry if I seem ignorant this time. I don't know if I was able to get my point across; the things this filesystem does, wouldn't they be better left on a different layer?
  • And this new billion billion ZFS will cost... by neuro.slug (Score:1) Thursday September 16 2004, @11:19AM
  • fileless systems by Doc Ruby (Score:2) Thursday September 16 2004, @11:19AM
    • Re:fileless systems by gorbachev (Score:3) Thursday September 16 2004, @11:26AM
    • Re:fileless systems by Helen O'Boyle (Score:1) Thursday September 16 2004, @11:55AM
      • Re:fileless systems (Score:5, Interesting)

        by Tony (765) on Thursday September 16 2004, @12:49PM (#10268498)
        (http://zoeshire.com/ | Last Journal: Thursday October 31 2002, @05:12PM)
        After years of everyone saying that the relational model was the answer to all data organziation needs... the hierarchical model reappeared in the form of XML, and people realized that it is convenient to organize some types of data hierarchically.

        Convenient, and flawed.

        XML isn't designed to handle changing data. It's designed to be a data markup language, which indicates it's used for presenting data, not managing data.

        So far, the relational model is the best mathematically-rigorous method of managing sets of data. There are many advantages to hierarchical data representation, but for manipulation, the relational still trumps.

        Do I want to use SQL to access my files? Not if I don't have to. There are perhaps better methods, even some transparent methods.

        But, do I want to continue to self-organize my data? Hell, no! There's just too much information stored on my computer, and on my network, these days. And, considering that much of my data has multiple relationships, the hierarchical model is growing a bit long in the tooth. Many of my documents belong in multiple hierarchies.

        But, there might be a real solution soon:

        Gnome Storage [gnome.org] looks to be a good first step.
        [ Parent ]
      • Re:fileless systems by Doc Ruby (Score:2) Thursday September 16 2004, @01:33PM
  • by kcbrown (7426) <slashdot@sysexperts.com> on Thursday September 16 2004, @11:19AM (#10267312)
    ...and that I haven't seen in any file system announced to date, is a way of bundling multiple filesystem operations into a single atomic transaction that can be rolled back. This would clearly require an addition of four system calls (one to begin a transaction, one to commit it, one to roll it back, and one to set the default action, commit or rollback, on exit).

    Such a feature would rock, because it would be possible to make things like installers completely atomic: interrupt the installer process and the whole thing rolls back.

  • Apparently... (Score:5, Funny)

    by qtone42 (741822) on Thursday September 16 2004, @11:20AM (#10267320)
    ... ZFS will also make you forget everything you knew about English grammar.

    "We've rethought everything and rearchitected it," says Jeff Bonwick

    Rearchitected? WTF? Howsaboot "Redesigned?"

    I'm still wrapping my brain around "adaptive endian-ness" as well.

    --QTone
  • There be Marketting here! by grasshoppa (Score:1) Thursday September 16 2004, @11:22AM
  • Sounds really nice (Score:5, Informative)

    by mveloso (325617) on Thursday September 16 2004, @11:25AM (#10267381)
    Looks like Sun went out and redid their filesystem based on the performance characteristics of machines today, instead of machines of yesteryear.

    Some highllights, for those that don't (or won't) RTA:

    * Data integrity. Apparently it uses file checksums to error-correct files, so files will never be corrupted. About time someone did this.

    * Snapshots, like netapp?

    * Transactional nature/copy-on-write

    * Auto-striping

    * Really, Really Large volume support

    All of this leads to speed and reliability. There's a lot of other stuff (varying blocks sizes, write queueing, stride stuff which I haven't heard about in years), but all of it leads to above.

    Oh, and they simplified their admin too.

    It's hard to make a filesystem look exciting. Most of the time it just works, until it fails. The data checksum stuff looks interesting, in that they built error correction into the FS (like CDs and RAID but better hopefully).

    It might also do away with the idea of "space free on a volume," since the marketing implies that each FS grows/shrinks dynamically, pulling storage out of the pool as needed.

    Any users want to chime in?
  • Patent-pending adaptive endianness? (Score:4, Insightful)

    by yeremein (678037) on Thursday September 16 2004, @11:25AM (#10267387)
    ZFS is supported on both SPARC and x86 platforms. More important, ZFS is endian-neutral. You can easily move disks from a SPARC server to an x86 server. Neither architecture pays a byte-swapping tax due to Sun's patent-pending "adaptive endian-ness" technology, which is unique to ZFS.
    Bleh. How expensive is it to byte-swap anyway? Compared with checking whether the number you're looking at is already the right endianness? Just store everything big-endian; x86 systems can swap it in a single instruction anyway. It's not like all data needs to be byte-swapped anyway, just metadata. I can't imagine the penalty would come even close to the amount of time spent doing their integrity checksums anyway.

    Looks to me like nothing more than an excuse to put up a patent tollboth for anyone who wants to implement ZFS.
  • Curious points (Score:4, Interesting)

    by tod_miller (792541) on Thursday September 16 2004, @11:25AM (#10267392)
    (Last Journal: Wednesday January 26 2005, @05:18AM)
    "Sun's patent-pending "adaptive endian-ness" technology"

    ok, that aside. First 128bit file system, and get this: transactional object model

    I think this means it is optimistic but they figure it has blazing fast performance, who am I to argue. Fed up with killing this indexing garbage on the work machine, bloody microsoft, disabled it and everything and every full moon it seems to come out and graze on my HDD platter.

    From the MS article : This perfect storm is comprised of three forces joining together: hardware advancements, leaps in the amount of digitally born data, and the explosion of schemas and standards in information management.

    Then I started to suspect they would rant about moores law and sure e-bloody-nough

    Everyone knows Moore's law--the number of transistors on a chip doubles every 18 months. What a lot of people forget is that network bandwidth and storage technologies are growing at an even faster pace than Moore's law would suggest.

    That is like saying, everyone knows the number 9 bus comes at half 3 on wednesdays, but noone expects 3 taxis sat there doing nothing at half past 3 on a tuesday.

    Can we put this madness to rest? Ok back to the articles.

    erm... lost track now....
  • huh? by helmespc (Score:1) Thursday September 16 2004, @11:26AM
  • crash every 5 minutes? by Anonymous Coward (Score:1) Thursday September 16 2004, @11:27AM
  • Forgot (Score:3, Funny)

    I was going to respond to the article, but I forgot everything I know about file systems.
    • Re:Forgot by arendjr (Score:1) Thursday September 16 2004, @12:23PM
    • .ac? by boomgopher (Score:1) Thursday September 16 2004, @12:37PM
  • Hmmm....Linux SAN killer? by Chuck Bucket (Score:2) Thursday September 16 2004, @11:35AM
  • Shared data pools... (Score:4, Interesting)

    by vspazv (578657) on Thursday September 16 2004, @11:37AM (#10267539)
    So what are the chances that someone could accidentally wipe the shared data pool for an entire company and how hard is recovery on a volume striped across a few hundred hard drives?
  • Hmmm...the last word? by Chuck Bucket (Score:2) Thursday September 16 2004, @11:39AM
  • so how long until... by t35t0r (Score:1) Thursday September 16 2004, @11:39AM
  • The Sun photo says it all.. by chill (Score:2) Thursday September 16 2004, @11:40AM
  • Solaris or Linux? by logicnazi (Score:2) Thursday September 16 2004, @11:40AM
    • ZFS by BJH (Score:2) Thursday September 16 2004, @12:05PM
    • Re:Solaris or Linux? by logicnazi (Score:1) Thursday September 16 2004, @03:07PM
    • 1 reply beneath your current threshold.
  • Last Word? (Score:5, Funny)

    by tedgyz (515156) * on Thursday September 16 2004, @11:40AM (#10267583)
    (http://roostme.com/)
    It's a 128-bit filesystem, so doesn't that make it the last 8 words?
    • 1 reply beneath your current threshold.
  • Some snippets from the article (Score:3, Informative)

    by ChrisRijk (1818) on Thursday September 16 2004, @11:41AM (#10267596)
    ZFS achieves its impressive performance through a number of techniques:
    * Dynamic striping across all devices to maximize throughput
    * Copy-on-write design makes most disk writes sequential
    * Multiple block sizes, automatically chosen to match workload
    * Explicit I/O priority with deadline scheduling
    * Globally optimal I/O sorting and aggregation
    * Multiple independent prefetch streams with automatic length and stride detection
    * Unlimited, instantaneous read/write snapshots
    * Parallel, constant-time directory operations


    ZFS has some similarities to NetApp's WAFL in that it uses "copy on write".

    One of the fun things with ZFS is that it automatically stripes across all the storage in your pool. Disk size doesn't matter - it's all used. This even works across SCSI and IDE.

    One of the important things is that volume management isn't a seperate feature. Effectively, all the current limitations of volume managers are blown away:

    Just as it dramatically eases the suffering of system administrators, ZFS offers relief for your company's bottom line. Because ZFS is built on top of virtual storage pools (unlike traditional file systems that require a separate volume manager), creating and deleting file systems is much less complex. Not only does this eliminate the need to pay for volume manager licenses and allow for single support contracts, it lowers administration costs and increases storage utilization.

    ZFS appears to applications as a standard POSIX file system--no porting is required. But to administrators, it presents a pooled storage model that eliminates the antique concept of volumes, as well as all of the related partition management, provisioning, and file system sizing problems. Thousands--even millions--of file systems can all draw from ZFS' common storage pool, each one consuming only as much space as it needs. The combined I/O bandwidth of all of the devices in that storage pool is always available to each file system.


    This is also part of the stuff making admin and configuration far far simpler. The thing I like is that it should be far harder to go wrong with ZFS (not available in Solaris Express yet so I haven't seen this for myself).

    The very high degree of reliability as standard is very welcome too:

    Data can be corrupted in a number of ways, such as a system error or an unexpected power outage, but ZFS removes this fear of the unknown. ZFS prevents data corruption by keeping data self-consistent at all times. All operations are transactional. This not only maintains consistency but also removes almost all of the constraints on I/O order and allows changes to succeed or fail as a whole.

    All operations are also copy-on-write. Live data is never overwritten. ZFS writes data to a new block before changing the data pointers and committing the write. Copy-on-write provides several benefits:

    * Always-valid on-disk state
    * Consistent, reliable backups
    * Data rollback to known point in time

    "We validate the entire I/O stack, start to finish, no guesswork involved. It's all provable data integrity," says Bonwick.

    Administrators will never again have to run laborious recovery procedures, such as fsck, even if the system is shut down in an unclean fashion. In fact, Solaris Kernel engineers Bill Moore and Matt Ahrens have subjected ZFS to more than a million forced, violent crashes in the course of their testing. Not once has ZFS lost data integrity or leaked a single block.


    For more technical info see Matt Ahrens's [sun.com] and Val Henson's [sun.com] blogs - since they're among the engineers who worked on it.
    • Hmmm.. by pVoid (Score:2) Thursday September 16 2004, @03:52PM
      • Re:Hmmm.. by Chainsaw (Score:2) Friday September 17 2004, @08:18AM
  • Solaris still works by SpaghettiPattern (Score:1) Thursday September 16 2004, @11:45AM
  • Actually, Novell already made ZFS... (Score:3, Informative)

    by thehunger (549253) on Thursday September 16 2004, @11:45AM (#10267661)
    The codename for the first generation of Novells current filesystem was ZFS. Why? because it was supposed to be "the last, or final word" in file systems.

    Novell now Novell Storage System (I think it used to be NetWare Storage System).

    Apart from the obvious fact that SUN didnt manage to be very original in naming their filesystem, its noteworthy that Novell is porting their ZFS - now NSS - to Linux. It'll be part of Novell Open Enterprise Server - on both Linux and NetWare kernels.

    From the top of my mind, here are some features of NSS that SUN needs to exceed to qualify for a new "final word..":

    - Background compression
    - Fast on-demand decompression
    - Transactions
    - Pluggable Name spaces
    - Pluggable protocols (ie. http, nfs, etc)
    - Advanced Access control model with inheritance, rights filters, etc. integrated with directory service (duh!)
    - Quotas on user, group, directory level
    - 64-bit (ok, SUN obviously got that one)
    - mini-volumes
    - journaled
    - etc.

    oh well, I wont bother continuing, but its worth looking out for NSS. Hopefully Novell will open source it and not make it exclusive to their distros.
  • by anzha (138288) on Thursday September 16 2004, @11:50AM (#10267710)
    (http://thedragonstales.blogspot.com/ | Last Journal: Friday October 05, @02:38PM)

    Right now there are a lot of file systems that do somehing not all that different than what Sun is proposing. The project [nersc.gov] I am on is evaluating them as we speak for a center wide filesystem. I've had the fun (no sarcasm, honestly) of setting up a number of different onces and helping to run benchmarks and tests against each. All of them have strengths. Every single one of them has some nasty weaknesses.

    If you are looking for an open source based cluster file system, Lustre [lustre.org] is what you want. It's supported by LLNL, PNNL, and the main writers at ClusterFS Inc [clusterfs.com]. It's a network based cluster FS. We've been using it over GigE. However, we've found that there needs to be a ratio of 3:1 for data server:clients for a ratio. Wehave only used one metadata server. Failover isn't the greatest. Quotas don't exist. it also makes kernel mods (some good and bad) to do a mild fork of the linux kernel (they put them into the newer kernels every so often). It only runs on Linux. Getting it to run on anything else looks...scary.

    GPFS [ibm.com] runs on AIX and Linux. Even sharing the same storage. It runs and is pretty stable. it has the option to run in a SAN mode or network based FS. In the latter form, it even does local discovery of disks via labels so that if a client can see the disks locally it will read and write to them via FC rather than to the server. It, however, is a balkanized mess. It requires a lot more work to bring up and run: there is an awful lot of software to configure to get it to run (re: RSCT. If you haven't had the joys of HATS and HAGS, count yourself very, very lucky).

    ADIC's StorNext [adic.com] software is another option. This one is good if you are interested in ease of installation, maintanence, and very, very fast speeds (damn near line speed on Fibre channel). I have set this one up for sharing disks in less than two hours from first install to getting numerous assorted nodes of different OS's to play together (Solaris, AIX, Linux). It freakin on virtually everything from Crays to Linux to Windows. It's issues seem to be scaling (right now doesn't go past 256 clients) and it has some nontrivial locking issues (righting to the same block from multiple clients, and parallel I/O to the same file from multiple clients if you change the file size).

    There are some others that are not as mature. Among them are Ibrix [ibrix.com], Panasas [panasas.com], GFS [redhat.com], and IBM's SANFS [ibm.com]. All of them are interesting or promising. Only SANF looks like it runs on more than Linux though at this point. Our requirements for the project I am on are to share the same FS and storage instance among disparate client OSes simultaneously. This might not be the same for others though and these might be worth a look. Lustre dodges this because its open source and they're interested in porting.

  • The proof is in the pudding (Score:4, Informative)

    by melted (227442) on Thursday September 16 2004, @11:55AM (#10267777)
    (http://slashdot.org/)
    As someone who's been involved with performance/stress optimizations I can tell you that for each situation you can carefully put together two types of tests: one which proves that there's a problem, another that proves the problem doesn't exist.

    The proof is in the pudding. Let Sun release it and administrators use it for a year or two, then we'll see if it's good enough. Right now I'm having doubts it's as good as they want you to believe.
  • Boil the oceans, eh? (Score:3, Funny)

    by Hukui (809694) on Thursday September 16 2004, @12:06PM (#10267904)
    Logically, the next question is if ZFS' 128 bits is enough. According to Bonwick, it has to be. "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."

    Well...I never really like the oceans anyways. They were always so wet.

  • ZFS == Zun File System by chiph (Score:1) Thursday September 16 2004, @12:07PM
  • Zombo File System? by SunBug (Score:1) Thursday September 16 2004, @12:10PM
  • Links to real information by bartash (Score:2) Thursday September 16 2004, @12:15PM
  • Tru64 AdvFs by bubba_ry (Score:1) Thursday September 16 2004, @12:15PM
  • It must be! by ZipR (Score:1) Thursday September 16 2004, @12:19PM
  • It starts with Z by bcarl314 (Score:1) Thursday September 16 2004, @12:20PM
  • Well, Linux Won't Have It Any Time Soon by Master of Transhuman (Score:2) Thursday September 16 2004, @12:21PM
  • White Papers by dTb (Score:2) Thursday September 16 2004, @12:22PM
  • Hard drive as WMD? by Larthallor (Score:2) Thursday September 16 2004, @12:24PM
  • Next FS will be... by faccenda (Score:1) Thursday September 16 2004, @12:29PM
  • Way too big by null etc. (Score:1) Thursday September 16 2004, @12:33PM
  • LOC by pulse2600 (Score:1) Thursday September 16 2004, @12:33PM
  • Easy upgrades by dTb (Score:2) Thursday September 16 2004, @12:33PM
  • It is quite obvious why the space is needed by brunes69 (Score:2) Thursday September 16 2004, @12:35PM
  • 80 percent? by radarsat1 (Score:1) Thursday September 16 2004, @12:39PM
  • you had me until you boiled the oceans by blair1q (Score:2) Thursday September 16 2004, @12:54PM
  • Why only 128 bits? by Mr. McGibby (Score:2) Thursday September 16 2004, @12:58PM
  • UFS is a dog, I hope this is better but... by dbuttric (Score:1) Thursday September 16 2004, @01:20PM
  • Help! Where am I? Who am I? by Orthogonal Jones (Score:1) Thursday September 16 2004, @01:27PM
  • It's still a limitation by erroneus (Score:2) Thursday September 16 2004, @01:39PM
  • This is awesome news for the 5 people still using by MichaelPenne (Score:1) Thursday September 16 2004, @02:19PM
  • Change management features by JPyObjC Dude (Score:1) Thursday September 16 2004, @02:21PM
  • YADB by leandrod (Score:2) Thursday September 16 2004, @02:32PM
  • Another file system? by fred3666 (Score:1) Thursday September 16 2004, @02:42PM
  • Holy crap by Tim Browse (Score:2) Thursday September 16 2004, @05:03PM
  • Ahem... by Jozer99 (Score:1) Thursday September 16 2004, @06:10PM
  • More technical information on ZFS by ahrens (Score:2) Friday September 17 2004, @12:52AM
  • ZFS? by goober1473 (Score:1) Friday September 17 2004, @02:54AM
  • Re:not alphabetically (Score:5, Funny)

    by laird (2705) <lairdp@ELIOTgmail.com minus poet> on Thursday September 16 2004, @11:13AM (#10267210)
    (Last Journal: Monday April 07 2003, @07:39AM)
    Nah, the ultimate filesystem has to be xyzzyfs! Your data magically appears... :-)
    [ Parent ]
  • Re:rearchitected (Score:4, Funny)

    by sheriff_p (138609) on Thursday September 16 2004, @11:15AM (#10267242)
    Sadly Google returns no hits for rearchistrated
    [ Parent ]
  • Re:The thing about that.. by robslimo (Score:2) Thursday September 16 2004, @11:19AM
  • Re:At the risk of appearing to be an idiot... by ewanrg (Score:2) Thursday September 16 2004, @11:44AM
  • Re:The thing about that.. by WindBourne (Score:2) Thursday September 16 2004, @12:04PM
  • Re:The thing about that.. by Xardion (Score:1) Thursday September 16 2004, @12:22PM
  • It's solving additional problems by ColourlessGreenIdeas (Score:2) Thursday September 16 2004, @12:24PM
  • Re:yes but... by mek2600 (Score:1) Thursday September 16 2004, @01:19PM
  • 16 replies beneath your current threshold.