Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Open Source Moving in on the Data Storage World

Posted by ScuttleMonkey on Wed Apr 26, 2006 04:52 PM
from the sowing-data-seeds dept.
pararox writes "The data storage and backup world is one of stagnant technologies and cronyism. A neat little open source project, called Cleversafe, is trying to dispell of that notion. Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • by Ohreally_factor (593551) on Wednesday April 26 2006, @04:54PM (#15208120)
    (Last Journal: Sunday November 27 2005, @02:29PM)
    The data storage and backup world is one of stagnant technologies and cronyism.
  • Editors, please note! (Score:5, Informative)

    by Anonymous Coward on Wednesday April 26 2006, @04:57PM (#15208129)
    Editors please note!

    Editors, please note that there is some incorrect information in this post. Firstly, the original concept of the IDA was designed by Shamir of RSA fame, not Rabin.

    Also note that the Cleversafe IDA is a custom algorithm, and is only similar to Shamir's initial concept.
  • Backup for Backuper? (Score:3, Interesting)

    by foundme (897346) on Wednesday April 26 2006, @04:57PM (#15208130)
    (http://xccr.com/)
    I can't find this in the FAQ -- is there a "creator/seeder" in the whole process? Which means a particular group of slices can only be unlocked by a particular seeder created by Turbo IDA.

    If there is a creator/seeder, then we are still burdened by having to keep this seeder safe so that we can retrieve the distributed slices.

    If there is no creator/seeder, is this safe enough so that people cannot patch slices together by way of trial-and-error?
  • by gasmonso (929871) on Wednesday April 26 2006, @05:00PM (#15208157)
    (http://religiousfreaks.com/)

    At work we're looking into this to store critical data on out intranet which spans several states and facilites. Looks great, but only time will tell.

    I seem to remember a project months ago that was going to use P2P to backup your data on other P2P users computers which to me sounds quite insane. Anyone know if this is related?

    http://religiousfreaks.com/ [religiousfreaks.com]
  • The 'R' stands for Rivest, not Rabin (Score:5, Informative)

    by Durindana (442090) on Wednesday April 26 2006, @05:01PM (#15208162)


    While Michael Rabin was inventor of the Rabin cryptosystem [wikipedia.org] in 1979, it was Ronald Rivest, Adi Shamir and Len Adleman behind RSA [wikipedia.org] two years earlier.
  • Think RAID5, only way better (Score:5, Interesting)

    Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data.

    It seems like this can be tuned to provide varying levels of fault tolerance. According to the abstract (I don't have an ACM web account, and I couldn't find the full text), it seems like I can take a file and make it so that any four chunks can be used to rebuild the file. I can then take those chunks and distribute them eight times to different machines. Thus, five of the eight machines would have to be rendered inoperable before I were unable to retrieve my data.

    If I understand it correctly, then this is really slick.

  • stagnant?? (Score:4, Insightful)

    by Phredward (254393) on Wednesday April 26 2006, @05:09PM (#15208218)
    Companies are crying out for new storage solutions all the time. If the answer is slow in coming it is not due to "cronyism" and "stangnation". Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.
    • Re:stagnant?? by Anonymous Coward (Score:1) Wednesday April 26 2006, @05:17PM
      • Re:stagnant?? by kabz (Score:2) Wednesday April 26 2006, @07:51PM
    • Re:stagnant?? by klenwell (Score:1) Wednesday April 26 2006, @07:08PM
    • Re:stagnant?? by Slarty (Score:2) Wednesday April 26 2006, @07:30PM
    • Re:stagnant?? by kfg (Score:1) Wednesday April 26 2006, @07:32PM
      • Addendum by kfg (Score:2) Wednesday April 26 2006, @07:36PM
    • Re:stagnant?? by poot_rootbeer (Score:2) Thursday April 27 2006, @10:46AM
    • Re:stagnant?? by RobertLTux (Score:1) Wednesday April 26 2006, @05:58PM
    • Re:stagnant?? by jabuzz (Score:2) Thursday April 27 2006, @05:03AM
    • 2 replies beneath your current threshold.
  • oh yea (Score:1)

    by dingDaShan (818817) on Wednesday April 26 2006, @05:14PM (#15208246)
    Since all we need is a majority of files, its a realtime compression scheme of 51%. ------ Thats what I would do. You do whatever you want.
    • Re:oh yea by dgatwood (Score:2) Wednesday April 26 2006, @06:10PM
    • 1 reply beneath your current threshold.
  • Rar + Par + BitTorrent? (Score:5, Interesting)

    by DigitalRaptor (815681) on Wednesday April 26 2006, @05:17PM (#15208272)
    (http://brianallen.isagenix.com/)
    This sounds like Rar, Par, and BitTorrent got merged in some freak transporter accident...

    Par files (for use with QuickPar, etc) are great, saving all sorts of extra posting on binary newsgroups.

  • Not a new idea (Score:5, Informative)

    by D3viL (814681) on Wednesday April 26 2006, @05:18PM (#15208278)
    so it's sort of like parchive http://parchive.sourceforge.net/ [sourceforge.net] which is software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data
  • Sourceforge page (Score:1, Informative)

    by Anonymous Coward on Wednesday April 26 2006, @05:19PM (#15208286)
    Well, their webserver seems like it's been smoked, here's a link to their sourceforge page, where you can grab the actual software:

    http://it.slashdot.org/it/06/04/26/2039224.shtml [slashdot.org]
  • You mean Shamir, not Rabin (Score:5, Interesting)

    by Anonymous Coward on Wednesday April 26 2006, @05:20PM (#15208296)
    While R in RSA stands for Ron Rivest, it is Adi Shamir (S of RSA) you have in mind. He came up with a wonderful secret sharing scheme which allows a bunch of folks or computers to keep pieces of secret in such a way that no N of them have any idea what the secret is, even if they collude. OTOH N+1 of them can easily figure out the secret. RSA can help you keep important secrets safe this way: if the owner is OK, the secret cannot be recreated; if the owner quits or dies, all-important secret holders can recover his password and unencrypt critical company data. And if a couple of them cannot participate, you still can get your secret back.

    Even more amazingly Shamir's secret sharing scheme allows computing math functions, such as digital signatures, without ever recovering secret keys. This is called threshold cryptography, some of you may be interested to learn about its many wonders. Shamir rocks and so is threshold crypto!
  • innovation (Score:2, Interesting)

    by Ajehals (947354) <andyhalsall.ictsc@com> on Wednesday April 26 2006, @05:24PM (#15208320)
    (http://www.ictsc.com/ | Last Journal: Saturday December 09 2006, @10:15PM)
    Any innovation (if that's what this is - no doubt it will turn out to be something that someone else thought of in the 80's..) is welcomed in this area.

    Maybe one day vendors will stop pushing overly expensive and utterly bland storage solutions. i.e. Last time I had a meeting about storage the product was: 2x Servers 2x Disk Arrays with possible storage of a little under 2TB (using 24 80Gb SCSI HDDs) with RAID 5, Oh and the storage was presented as 4 @500Gb drives to the OS (Some proprietary thing). all in at a cool £27.000, (and that was before the license for CIFS) guess how it was billed - innovative... Its a joke, so the solution? In the meantime lots of SATA Drives and file replication, eventually? maybe we can make use of all that storage that sits on every machine on the LAN that is never used...

    • Storage should be Boring! (Score:5, Insightful)

      by stereoroid (234317) on Wednesday April 26 2006, @06:19PM (#15208614)
      (http://stereoroid.com/ | Last Journal: Wednesday August 07 2002, @05:45AM)
      One point that's been brought home to me in a very real way, in my position in senior support for one of the major storage system vendors: the hard disks themselves really do make a difference. SCSI disks are much more expensive because of their construction, the duty cycles they can perform to over long periods. You can NOT hammer a SATA disk at 90% of the time, 24/7, and expect it to last the way an enterprise-class SCSI disk does. My company sells low-cost SATA disk systems too, and some customers find that the lower price is a false economy for what they need the system to do.

      I'm kinda missing the point of the "editorializing" in this article: when a storage system is doing its job, it IS boring. You put bytes in, assured they will be stored, and you get them out on demand. You want nothing "interesting" to happen to the data that your business is built on! Sure, the technology is stagnant, if that means customers can get access to the data, reliably, year after year. We Slashdotters are prepared to take "bleeding edge" risks that enterprise customers are not.
      [ Parent ]
    • maybe you should look at iSCSI by snuf23 (Score:2) Wednesday April 26 2006, @08:25PM
  • been done before (Score:5, Informative)

    by Splork (13498) on Wednesday April 26 2006, @05:26PM (#15208331)
    (http://electricrain.com/greg/)

    Related companies/projects happened in this order: MojoNation [archive.org] .. MNet [mnetproject.org] .. HiveCache [archive.org] .. AllMyData [allmydata.com]

    good luck!

    • Re:been done before by Beryllium Sphere(tm) (Score:2) Wednesday April 26 2006, @06:03PM
    • Publius by twitter (Score:3) Wednesday April 26 2006, @06:05PM
      • Re:Publius by mcrbids (Score:2) Wednesday April 26 2006, @11:44PM
        • Re:Publius by Alsee (Score:2) Thursday April 27 2006, @01:48PM
          • Re:Publius by mcrbids (Score:2) Friday April 28 2006, @03:54AM
    • 1 reply beneath your current threshold.
  • by dfloyd888 (672421) * on Wednesday April 26 2006, @05:28PM (#15208342)
    In the early 90s, a company made a virtual file server for networked Macs. Each client Macintosh had a file on its hard drive, and when a request was made through the driver, a number of Macs were contacted, and files were read and written to in a fairly load balanced fashion. I'm pretty sure it used some decent (think single DES) encryption at the time too, so someone couldn't just dig through the server's file on their Mac's hard disk and glean important data. It also added some redundancy, so if a Mac or two wasn't up on the network, it wouldn't kill the virtual Appleshare folder.

    By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant. If done right, one could create data brokerage houses, where people could buy and sell storage space, and also reliability, where space on a RAID or server array would be of higher value than space on a laptop that is rarely on the Internet.
  • by Nesetril (969734) on Wednesday April 26 2006, @05:32PM (#15208363)
    generally, speaking the more copies of something you have floating around, the larger the probability they get into the wrong hands. so this whole redundancy thing is just going to be viewed as a huge security breach, and never really become popular...
  • Borg Technology (Score:5, Funny)

    When I read the statement: ...the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines.

    I was immediately visualizing a Borg Cube regenerating after a hit from the Enterprise.

    regardless, it sounds cool.

  • Link to pay-for-view contents (Score:4, Insightful)

    by andrew cooke (6522) <andrew@acooke.org> on Wednesday April 26 2006, @05:42PM (#15208416)
    (http://www.acooke.org/andrew)
    The most interesting link here is behind a pay-wall. Do the editors bother to follow the link in articles? Do they just assume we all have ACM access? Come on, this place used to be a bit better than this, didn;t it?
  • New idea... NOT. (Score:5, Informative)

    by pedantic bore (740196) on Wednesday April 26 2006, @05:44PM (#15208428)
    Why does this remind me of something [harvard.edu]? It sounds like something I've heard [carleton.ca] about already [cmu.edu], more or less [gatech.edu].

    I just hope they don't patent it [uspto.gov]!

  • by Saturn49 (536831) on Wednesday April 26 2006, @06:01PM (#15208523)
    This can be done quite easily with Reed-Solomon coding. In fact, you don't need the majority of the nodes, but simply an arbitrary N set of nodes, with an arbitrary M nodes as redundancy. N=1 and M=1 is basically RAID1. N = n and M = 1 is simply RAID5, N=n and M=2 is RAID 6.

    In fact, I wrote a RSRaid driver for Linux for my thesis and did some performance testing on it. I'll save you the 30 pages and just tell you that the algorithm is far too CPU intensive to scale up very well for fileserver use (my original intent,) but I did conclude it could be used as a backup alternative to tape. Hmmmm.

    Direct Link [dyndns.org]
    Google Cache [72.14.203.104]
    Please forgive the double brackets, I fought witH Word and lost.
    Contact me if you'd like to play with the code. I never did any reconstruction code, but the system did work in a degraded state, and was written for the Linux 2.6 kernel.
  • by nurb432 (527695) on Wednesday April 26 2006, @06:17PM (#15208611)
    (http://slashdot.org/~nurb432/ | Last Journal: Friday August 27 2004, @03:24PM)
    As they appear to be toast now...

    And how can you say backing up to a *single* desktop pc is of any value?
  • Shameless plug... (Score:2)

    by richdun (672214) on Wednesday April 26 2006, @06:21PM (#15208632)
    ...for my alma mater.

    Cleversafe's headquarters are located at the new University Technology Park [university...gypark.com] at IIT...no, not that IIT, this one [iit.edu].
    • 1 reply beneath your current threshold.
  • Par and Par2? (Score:1)

    by Anonymous Coward on Wednesday April 26 2006, @06:30PM (#15208682)
    Anyone who has used usenet in the last decade or so knows most binaries are split into multiple parts (RAR's now-a-days) with PAR and PAR2 recovery volumes. So instead of making this sound like an awesome new development, why not be honest about what it is: a slightly different application of a very old technology/algorithm.
  • RAID 5 at the File Level (Score:3, Interesting)

    by kbahey (102895) on Wednesday April 26 2006, @06:43PM (#15208745)
    (http://baheyeldin.com/)
    Slashdotted! Can't check the site contents or the wiki.

    From the summary : "the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data."

    So, basically it is like RAID 5 striping and parity [wikipedia.org] applied to the file level.

    Neat concept.
  • Notes from lead Cleversafe designer (Score:5, Informative)

    by mengland (78217) on Wednesday April 26 2006, @07:59PM (#15209129)
    (This is a repost from an earlier part of the thread so that I can get these comments on the toplevel.)

    Hello-

    I am the lead designer of the first Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).

    If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.

    Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.

    The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.

    We feel this system provides a powerful combination of reliability, scalability, economy, and security.

    The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
    http://wiki.cleversafe.org/Grid_Design [cleversafe.org] [cleversafe.org]

    There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...

    I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org] [cleversafe.org]
    Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.

    -Matt
    Cleversafe project lead
  • by PMoonlite (11151) on Wednesday April 26 2006, @08:21PM (#15209236)
    Then with my 11 GMail accounts I get something like 10GB of free, secure, offsite data backup!
    • 1 reply beneath your current threshold.
  • 3Par (Score:1)

    by slashpot (11017) on Wednesday April 26 2006, @08:31PM (#15209288)
    3Par InServe is built on Linux.
    We have a couple in a closet at work.

    www.3par.com

  • Correction!!! (Score:1)

    by b4704084 (871482) on Wednesday April 26 2006, @09:25PM (#15209516)
    Facts:
    1) Information dispersal algorithm (IDA) was invented by Professor Michael Rabin at Harvard. IDA is an algorithm for distributed storage.

    2) R in RSA stands for Professor Ronald Rivest at MIT. This article has nothing to do with RSA.
  • by mengland (78217) on Wednesday April 26 2006, @09:37PM (#15209562)
    More notes on our IDAs compared with others:

    The Cleversafe information dispersal algorithms (IDAs) were designed to provide real-time performance with large amounts of data storage and retrieval (gigabytes, petabytes and above). Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data. The Cleversafe algorithms encode AND decode data with a computational overhead that is linearly proportional to the size of the data blocks. Specifically, the Cleversafe encoding algorithms for an 11 node grid with a threshold level of 6, required 5 operations per byte to encode data. For decoding on this dispersed storage grid, the Cleversafe algorithms require 4 operations per byte to decode data greater than 99% of the time and no more than 13 operations per byte in rare cases.

    Another Cleversafe contributor, Chris Gladwin, developed our IDAs. For more info:

    http://wiki.cleversafe.org/Turbo_IDA_Technology [cleversafe.org]

    On can also read an Excel spreadsheet [cleversafe.org] (found in the above wiki page) and C++ source code [cleversafe.org] that represents the "guts" of our 11-Pillar IDA code module.

    For more info about Cleversafe contributors:

    http://wiki.cleversafe.org/Cleversafe_Contributors [cleversafe.org]

    You can see Chris and I at the bottom of the page which is ordered with the most-recent contributor listed first.

    -Matt England

    ps: We are finishing up our project announcement at this week's MySQL User's Conference [mysqluc.com] where we drew significant interest. We have engaged some MySQL core developers regarding integrating the their technology with ours.
  • by rickb928 (945187) on Wednesday April 26 2006, @10:30PM (#15209799)
    (http://www.cybernexus.net/)
    1. It does seem like RAID5+ at the file level. More redundant, so it seems cooler to me.

    2. Has anyone read an article on Google's file system? This sounds a lot like it. Multiple stripes, recovery with less than N-2 parts, and Google uses it to improve performance first, with copies worldwide more or less. I think the article was in Wired, but I'm too lazy to look it up.

    New? Maybe. Improved? Maybe. Cool? I want.

    rick
  • by l3v1 (787564) on Thursday April 27 2006, @01:54AM (#15210454)
    ou backup into small slices, any majority of which can be used

    Ok, I'm numb in the morning, but what the hell does that mean ? ... I won't trust my data to something I don't even understand. You can say to RTFM, but hey, this is the first paragraph about the software, it should be catchy and clear.

  • questions (Score:1)

    I skimmed through the links in the post and did not find answers to following questions:

    what is the storage size to datasize ratio? I am talking about practically meaningful numbers ensuring storage reliability comparable to the competitors.

    what is the storage reliability at the storage size to datasize ratio comparable to the competitors.

    Theoretical estimates will suffice.

    PS I do not have access to the full text of linked ACM paper
  • by youta (900287) on Thursday April 27 2006, @09:43AM (#15212126)
    I believe Microsoft does the same things under the covers in their BitTorrent alternative, plus some consideration for locality.

    Article here [microsoft.com]
  • by Yankovic (97540) on Thursday April 27 2006, @12:30PM (#15213929)
    MS has a similar concept already going through deep testing.

    http://research.microsoft.com/sn/Farsite/ [microsoft.com]

    Pretty cool stuff, check this out:

    Our prototypical target is a large company or university, meaning an organization with around 10^5 machines, storing around 10^10 files, containing around 10^16 bytes of data. We assume that the machines are interconnected by a high-bandwidth, low-latency, switched network. Also, at least for our initial version, we are assuming no significant geographical differences among machines.

    Lots more questions answered on the FAQ: http://research.microsoft.com/sn/Farsite/faq.aspx [microsoft.com] ... here's the publication list back as far as 2000 as well http://research.microsoft.com/sn/Farsite/publicati ons.aspx [microsoft.com] (though obviously this is prefaced by some 11 years by the original paper)
  • by mengland (78217) on Thursday April 27 2006, @01:54PM (#15214857)
    From: http://research.microsoft.com/sn/Farsite/ [microsoft.com]

    It does this by distributing multiple encrypted replicas of each file among a set of client machines.

    Therein lies the key. There exist many systems that copy entire files (or sets of data) to many machines/nodes. I have been introduced to several references to many other projects that claim similar things with similar language to projects like Farsite.

    The Cleversafe system never stores an entire file (or data/file set) in any one place, encrypted or not. Only portions of any file (known as file "slices") are stored anywhere on the Cleversafe dispersed-storage grid. In our (Cleversafe's) opinion, this reduces complexity (by not having to synchronize multiple copies) and increases security and privacy (by never storing all of the data in one place), among other things.

    In short, some key differentiating question I typically ask when investigating Cleversafe-competitive systems:

    Does the system...

    * ...store an entire file/data/content set (encrypted or otherwise) in one place?
    * ...make multiple copies of the data?

    If either answer is yes, then I tend to view the project as significantly different then the Cleversafe technology. I have found full-replication-based methods in many various forms are quite prevelant in many applications.

    -Matt
  • by mengland (78217) on Thursday April 27 2006, @02:23PM (#15215101)
    Taking an excerpt from a previous post I made on another sub-thread [slashdot.org]:

    I felt it worth noting at the top level of this thread:

    The Cleversafe meta-data system was designed with an attempt to be able to easily use any information-dispersal algorithm (IDA) available today (including the Reed-Solomon, Shamir, and current Cleversafe methods) or in the future. In fact, the current Cleversafe IDA represents a small part of the code; the vast majority of the code and development effort can be found in the meta-data-management system to track data slices from an unlimited number of files originating from an unlimited number of users and computing systems; this also needs to be done in a way such that the entire system can tolerate and tremendous number of concurrent failures from the underlying system components.

    Is Cleversafe the first one to design a "hyper-redundant," grid-like, meta-data-management system? No. However, we believe we are the only ones to have built such a robust system based on an IDA mechanism with absolutely no replication of the data--and therefore, we contend, much less complexity. Further, I believe that this reduced complexity (when compared with other distributed file/meta-data systems) enables many powerful features, including performance scalability and better human serviceability.

    Will the Cleversafe system prove to be uniquely valuable? We believe so. However, as at least one other post on this thread mentions: time will tell.

    It's also important for me to reiterate: I personally designed much of the current meta-data system, so I present an obviously-biased perspective.

    -Matt
  • by mengland (78217) on Thursday April 27 2006, @03:01PM (#15215367)
    Anonymous writer writes:
    Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.

    To be clear:

    Dispersal is not encryption. (Cleversafe uses both.)

    While we (Cleversafe) do use public-private key methods to encrypt the data/content [cleversafe.org], this is still a separate operation from the data *dispersal* [cleversafe.org].

    Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.

    This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)

    Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology [cleversafe.org] for our grid design [cleversafe.org]) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code [sourceforge.net] (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.

    Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.

    One can read more about the open-source flavor of the Cleversafe grid design [cleversafe.org].

    -Matt

    ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ [cleversafe.org] (as well as to soon-to-be-available email lists that will synchronized with these forums).
  • Re:Aimed at who ? (Score:1)

    by themoneyish (971138) on Thursday April 27 2006, @04:51AM (#15210788)
    I am one of the devs at Cleversafe. As you might notice, this is only an alpha release. We shall soon be adding windows binaries. :) Keep checking. Manish
    [ Parent ]
  • 8 replies beneath your current threshold.