Slashdot is powered by your submissions, so send in your scoop

One Way To Save Digital Archives From File Corruption 257

Posted by timothy on Friday December 04, 2009 @08:50AM from the don'tcha-love-finding-corrupted-files dept.

storagedude points out this article about one of the perils of digital storage, the author of which "says massive digital archives are threatened by simple bit errors that can render whole files useless. The article notes that analog pictures and film can degrade and still be usable; why can't the same be true of digital files? The solution proposed by the author: two headers and error correction code (ECC) in every file."

This discussion has been archived. No new comments can be posted.

One Way To Save Digital Archives From File Corruption

Load All Comments

Search 257 Comments Log In/Create an Account

Comments Filter:

To much reinvention (Score:5, Interesting)

by DarkOx ( 621550 ) writes: on Friday December 04, 2009 @08:56AM (#30322692) Journal

If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.
Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.

Share
twitter facebook
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  What we are talking about here is how to add more redundancy on the software level.. but honestly..
  
  ...why not do it at the hardware level where there is already redundancy, and cant be fucked up by an additional error vector?
  - Re:To much reinvention (Score:5, Insightful)
    
    by MrNaz ( 730548 ) * writes: on Friday December 04, 2009 @09:45AM (#30323018) Homepage
    
    Ahem. RAID anyone? ZFS? Btrfs? Hello?
    Isn't this what filesystem devs have been concentrating on for about 5 years now?
    
    Parent Share
    twitter facebook
    - Re:To much reinvention (Score:4, Informative)
      
      by An dochasac ( 591582 ) writes: on Friday December 04, 2009 @10:50AM (#30323632)
      
      "says massive digital archives are threatened by simple bit errors that can render whole files useless.
      Isn't this what filesystem devs have been concentrating on for about 5 years now?
      Not just 5 years. ZFS's CRC on every datablock and Raid Z (no raid hold) are innovative and obviously the next step in filesystem evolution. But attempts at redundancy aren't new. I'm surprised the article is discussing relatively low teck old hat ideas such as two filesystem headers. Even DOS's FAT used this raid0 type of brute force redundancy by having two FAT tables. The Commodore Amiga's Intuition filesystem did this better than Microsoft back in 1985 by having forward and backward links in every block which made it possible to repair block pointer damage by searching for a reference to the bad block in the preceding and following block.
      And I suppose if ZFS doesn't catch on, 25 or 30 years from now Apple or Microsoft will finally come up with it and say, "Hey look what we invented!"
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by Bengie ( 1121981 ) writes:
      
      Not all RAID implementations check for errors. Some cheaper hardware will duplicate data or make parity, but never check for corruption. Instead, they only use the duplicated data for recovery purposes. Nothing says fun like rebuilding a RAID drive only to find your your parity data was corrupt, but you won't know that until you try to use your new drive and weird things happen..
      Will the corruption affects your FS or will it affect your data.. 8-ball says........
    - Re: (Score:2)
      
      by Jazz-Masta ( 240659 ) writes:
      
      RAID is not backup or archive. If you have a RAID1 system with bit errors on one disk - you now have them on the other disk.
      Using more complex RAID configs does not necessarily solve the problem.
      With archiving, the problem becomes apparent after you pull out your media after 7 years of not using it. Is that parallel ATA hard drive you stored it on still good? Do you have a connection for it? What about those zip and jazz drives you used? Do you have a method of reading them? Are those DVDs and CDs still goo
    - Re: (Score:3, Interesting)
      
      by Rockoon ( 1252108 ) writes:
      
      File Systems are in the software domain. If you arent getting good data (what was written) off the drive, the File System ideally shouldn't be able to do any better than the hardware did with the data. Of course, in reality the hardware uses a fixed redundancy model that offers less reliability than some people like. The danger of software-based solutions is that it allows hardware manufacturers to offer even less redundancy, or even NO redundancy at all, causing a need for even MORE software based redundan
- Re: (Score:3, Insightful)
  
  by Interoperable ( 1651953 ) writes:
  
  I agree that filesystem level error correction is good idea. Having the option to specify ECC options for a given file or folder would be great functionality to have. The idea presented in this article, however, is that certain compressed formats don't need ECC for the entire file. Instead, as long as the headers are intact, a few bits here or there will result in only some distortion; not a big deal if it's just vacation photos/movies.
  By only having ECC in the headers, you would save a good deal of storage
  - Brave New World (Score:2)
    
    by AliasMarlowe ( 1042386 ) writes:
    
    "Ending is better than mending".
    Consumers should welcome file corruption; it's a chance to throw away those old files and buy some brand new ones instead :)
    
    Actually, I would not be surprised if the media companies were busily trying to invent a self-corrupting DRM format to replace DVDs and suchlike.
    - Re: (Score:2)
      
      by dwye ( 1127395 ) writes:
      
      > Actually, I would not be surprised if the media companies were
      > busily trying to invent a self-corrupting DRM format to replace
      > DVDs and suchlike.
      Any physical medium is automatically self-corrupting. Leave a DVD on the back shelf of a car for a summer afternoon if you don't believe that.
- Re:To much reinvention (Score:4, Insightful)
  
  by bertok ( 226922 ) writes: on Friday December 04, 2009 @10:06AM (#30323188)
  
  If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.
  Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.
  Precisely. This is what things like torrents, RAR files with recovery blocks, and filesystems like ZFS are for: so every app developer doesn't have to roll their own, badly.
  
  Parent Share
  twitter facebook
  - Re:To much reinvention (Score:5, Interesting)
    
    by Hatta ( 162192 ) writes: on Friday December 04, 2009 @10:52AM (#30323676) Journal
    
    Don't forget PAR2 [wikipedia.org]. I never burn a DVD without 10%-20% redundancy as par2 files. Even if the filesystem gets too damaged to read, I can usually dd the whole disk and let par2 recover the files.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by Phreakiture ( 547094 ) writes:
  
  A simple solution would be to add a file full of redundancy data alongside the original on the archival media. A simple application could be used to repair the file if it becomes damaged, or test it for damage before you go to use it, but the original format of the file remains unchanged, and your recovery system is file system agnostic.
- Re: (Score:2)
  
  by foniksonik ( 573572 ) writes:
  
  How about an archive format, essentially a zip file, which contains additional headers for all of it's contents? Something like a manifest would work well. It could easily be an XML file with the header information and other meta-data about the contents of the archive. This way you get a good compromise between having to store an entire filesystem and the overhead of putting this information in each file.
  Add another layer by using a striped data format with parity and you have the ability to reconstruct any
- Re: (Score:2)
  
  by Bakkster ( 1529253 ) writes:
  
  Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.
  Not to nitpick, but I'm going to nitpick...
  Parity bits don't allow you to correct an error, only check that an error is present. This is useful for transfer protocols, where the system knows to ask for the file again, but the final result on an archived file is the same: file corrupt. With parity, though, you have an extra bit that can be corrupted.
  In other words, parity is an error check system, what we need is ECC (Error Check and Correct). Single parity bit is a 1-bit check, 0-bit correct. we need
  - Re: (Score:2)
    
    by Bakkster ( 1529253 ) writes:
    
    Sorry, messed up my abbreviations.
    We want Error Detect and Correct.
    ECC is an Error Correcting Code.
- - Re: (Score:2)
    
    by commodore64_love ( 1445365 ) writes:
    
    >>>>> If this type of thing is implemented at the file level every application is going to have to do its own thing.
    >>
    >>Great. So add it to the system level.
    Somebody has ADD and didn't bother to finish reading the *whole* paragraph. Quote: "It would be better to do most of them at the filesystem..."
    - Re: (Score:2)
      
      by imamac ( 1083405 ) writes:
      
      It's a constant struggle.
  - Re:To much reinvention (Score:5, Insightful)
    
    by paradxum ( 67051 ) writes: on Friday December 04, 2009 @09:14AM (#30322838)
    
    It already exists, it's called ZFS on solaris boxxen. Each block uses ECC, it can correct itself on each read, and generally can indicate a failing disk. This truly is the filesystem every other one is playing catchup with.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Funny)
      
      by Whalou ( 721698 ) writes:
      
      ReiserFS is good for that also. If you make a deal with the 'file system' it will tell you where your 'file' is hidden.
    - Re: (Score:2)
      
      by Linker3000 ( 626634 ) writes:
      
      You have just taken the hideous word 'boxen' to new heights!
      True, though, ZFS is a bleedin obvious implementation of this kind of thing
    - Re: (Score:2)
      
      by eclectus ( 209883 ) writes:
      
      Not does it just contain ECC for each block, but the ECC for the block is not contained within the block, so you don't run into problems with self consistent but still bad blocks.
    - Re: (Score:2)
      
      by mlts ( 1038732 ) * writes:
      
      I agree with you about ZFS. Getting Sun to dual license it under the GPL would mean it becoming a de facto standard everywhere but Macs and Windows. However, we should have ECC in the file format, so regardless of what type of machine a file is stored on, there is a good chance of repairs for a file.
      On an ISO basis, one can use a utility like DVDDisaster to append ECC info to the end of an ISO file before burning to CD or DVD. Since this is transparent to user action, the use of the added ECC only really
    - Re: (Score:2)
      
      by phantomcircuit ( 938963 ) writes:
      
      Actually ZFS does not use ECC unless you are using a RAIDZ. It does use checksums for all blocks, but that can only detect error, if you're not using RAIDZ or a mirror or have copies=(2|3) the error will be fatal, but at least you'll know about it.
- - Re: (Score:2)
    
    by mlts ( 1038732 ) * writes:
    
    Having ECC in metadata is fine, but in a perfect world, applications shouldn't ignore ECC. Especially when a file is being used and modified. Instead, either the app should update the ECC records when any writes are performed, or at least when the file is being closed.
par files (Score:5, Informative)

by ionix5891 ( 1228718 ) writes: on Friday December 04, 2009 @08:58AM (#30322704)

include par2 files

Share
twitter facebook
- Re: (Score:2)
  
  by drooling-dog ( 189103 ) writes:
  
  I'm glad I didn't have to scroll down too far to see this. They're practically magic.
- - Re: (Score:2)
    
    by CptNerd ( 455084 ) writes:
    
    I've been doing this for a while now for files that I back up to CD or DVD, just in case the media doesn't have that long a life. I've actually had much more problem with sectors going bad on hard drives than on CD or DVD, but that could change. I really hate that I lost an irreplaceable movie file I took when I was in Japan, because a sector went bad on my server (long since gone now) that kept me from even copying the good sectors from it. I couldn't even run fsck or anything since it was a leased ser
Or just use PAR for your archives (Score:2)

by syntap ( 242090 ) writes:

Done. +1 to the poster who said there is some round transportation implement being reinvented here.
It's that computer called the brain. (Score:5, Interesting)

by commodore64_love ( 1445365 ) writes: on Friday December 04, 2009 @09:00AM (#30322724) Journal

>>>"...analog pictures and film can degrade and still be usable; why can't the same be true of digital files?"
The ear-eye-brain connection has ~500 million years of development, and has learned the ability to filter-out noise. If for example I'm listening to a radio, the hiss is mentally filtered-out, or if I'm watching a VHS tape that has wrinkles, my brain can focus on the undamaged areas. In contrast when a computer encounters noise or errors, it panics and says, "I give up," and the digital radio or digital television goes blank.
What we need is a smarter computer that says, "I don't know what this is supposed to be, but here's my best guess," and displays noise. Let the brain then takeover and mentally remove the noise from the audio or image.

Share
twitter facebook
- Re:It's that computer called the brain. (Score:5, Interesting)
  
  by commodore64_love ( 1445365 ) writes: on Friday December 04, 2009 @09:09AM (#30322794) Journal
  
  P.S.
  When I was looking for a digital-to-analog converter for my TV, I returned all the ones that displayed blank screens when the signal became weak. The one I eventually chose (x5) was the Channel Master unit. When the signal is weak it continues displaying a noisy image, rather than go blank, or it reverts to "audio only" mode, rather than go silent. It lets me continue watching programs rather than be completely cutoff.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Insightful)
  
  by ILongForDarkness ( 1134931 ) writes:
  
  And how well did that work for your last corrupted text file? Or a printer job that the printer didn't know how to handle? My guess you could pick out a few words and the rest was random garble. The mind is good at filtering out noise but it is an intrinsically hard problem to do a similar thing with a computer. Say a random bit is missed and the whole file ends up shifted one to the left, how does the computer know that the combinations of pixel values it is displaying should start one bit out of sync so t
  - Re: (Score:2)
    
    by commodore64_love ( 1445365 ) writes:
    
    >>>And how well did that work for your last corrupted text file?
    Extremely well. I read all 7 Harry Potter books using a corrupted OCR scan. From time to time the words might read "Harry Rotter pinted his wand at the eneny," but I still understood what it was supposed to be. My brain filtered-out the errors. That corrupted version is better than the computer saying, "I give up" and displaying nothing at all.
    .
    >>>Say a random bit is missed and the whole file ends up shifted one to the lef
- Re: (Score:2)
  
  by nedlohs ( 1335013 ) writes:
  
  Sure, but you'd need to not compress anything and introduce redundancy. Most people prefer using efficient formats and doing error checks and corrections elsewhere (in the filesystem, by external correction codes - par2 springs to mind, etc).
  Here's a simple image format that will survive bit flip style corruption:
  Every pixel is stored as with 96 bits, the first 32 is the width of the image, the next 32 is the height, the next 8 bits is the R, the next 8 bits is the G, the next 8 bits is the B, and the last
- Re:It's that computer called the brain. (Score:5, Interesting)
  
  by Phreakiture ( 547094 ) writes: on Friday December 04, 2009 @10:26AM (#30323388) Homepage
  
  What we need is a smarter computer that says, "I don't know what this is supposed to be, but here's my best guess," and displays noise. Let the brain then takeover and mentally remove the noise from the audio or image.
  Audio CDs have always done this. Audio CDs are also uncompressed*.
  The problem, I suspect, is that we have come to rely on a lot of data compression, particularly where video is concerned. I'm not saying this is the wrong choice, necessarily, because video can become ungodly huge without it (NTSC SD video -- 720 x 480 x 29.97 -- in the 4:2:2 colour space, 8 bits per pixel per plane, will consume 69.5 GiB an hour without compression), but maybe we didn't give enough thought to stream corruption.
  Mini DV video tape, when run in SD, uses no compression on the audio, and the video is only lightly compressed, using a DCT-based codec, with no delta coding. In practical terms, what this means is that one corrupted frame of video doesn't cascade into future frames. If my camcorder gets a wrinkle in the tape, it will affect the frames recorded on the wrinkle, and no others. It also makes a best-guess effort to reconstruct the frame. This task may not be impossible with more dense codecs that do use delta coding and motion compensation (MPEG, DiVX, etc), but it is certainly made far more difficult.
  Incidentally, even digital cinemas are using compression. It is a no-delta compression, but the individual frames are compressed in a manner akin to JPEGs, and the audio is compressed either using DTS or AC3 or one of their variants in most cinemas. The difference, of course, is that the cinemas must provide a good presentation. If they fail to do so, people will stop coming. If the presentation isn't better than watching TV/DVD/BluRay at home, then why pay the $11?
  (* I refer here to data compression, not dynamic range compression. Dynamic range compression is applied way too much in most audio media)
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by Phreakiture ( 547094 ) writes:
      
      The theatres in our area don't have sticky floors! I feel cheated!
  - Re: (Score:2)
    
    by stevew ( 4845 ) writes:
    
    You're right about CD's always having contained this type of ECC mechanism. For that matter you will see this type of ECC in Radio based communications infrastructure and data that gets written to Hard disks too! In other words - all modern data storage devices (except maybe Flash..) contain ECC mechanisms that allow burst error detection and correction.
    So now we're talking about being doubly redundant. Put ECC on the ECC? I'm not sure that helps.
    Consider - if a bit twiddles on a magnetic domain. It wi
    - Re: (Score:2)
      
      by Phreakiture ( 547094 ) writes:
      
      I don't see the point.
      Having dealt with corrupted DVDs, I do.
      You could always put the redundancy data someplace else entirely, too. Realistically, this is an extension of the RAID concept.
    - Re: (Score:2)
      
      by TigerNut ( 718742 ) writes:
      
      One thing that helps is to specifically look at damage mechanisms and then come up with a strategy that offers the best workaround. As an example, in the first-generation TDMA digital cellular phone standard, it was recognized that channel noise was typically bursty. You could lose much of one frame of data, but adjacent frames would be OK. So they encoded important data with a 4:1 forward error correction scheme, and then interleaved the encoded data over two frames. If you lost a frame, the de-interleavin
  - Re: (Score:2)
    
    by phantomcircuit ( 938963 ) writes:
    
    If the presentation isn't better than watching TV/DVD/BluRay at home, then why pay the $11?
    To get away from the kids? To make out in the back row?
    Seriously people dont go to the movie theatre for the movies mostly.
    Also VLC tends to do best guess of the frame when there is corruption, it's usually a pretty bad guess but it still tries.
- Re: (Score:2)
  
  by PinkyDead ( 862370 ) writes:
  
  This was extremely useful in the 1980s when certain television channels were available from Sweden and Holland but only in a scrambled form.
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  but how much of our ability to "read through" noise is because we know what data to expect in the first place?
  I know when listening to a radio station that's barely coming in, many times it will sound like random noise, but at some point I will hear a certain note or something and suddenly I know what song it is. Now that I know what song it is, I have no problems actually "hearing" the song. I know what to expect from the song or have picked up on certain cues and I'm guessing that pattern recognition goes
- Re: (Score:2)
  
  by Kjella ( 173770 ) writes:
  
  That would be trivial to do if we were still doing BMP and WAV files, where one bit = one speck of noise. But on the file/network level we use a ton of compression, and the result is that a bit error isn't a bit error in the human sense. Bits are part of a block, and other blocks depend on that block. One change and everything that comes after until the next key frame changes. That means two vastly different results are suddenly a bit apart.
  Of course, no actual medium is that stable which is why we use erro
- - Re: (Score:2)
    
    by commodore64_love ( 1445365 ) writes:
    
    Hmmm. My DVD player just displays a blue screen when it encounters a corrupt MPEG-2 stream
Sun Microsystems..... zfs..... (Score:3, Insightful)

by HKcastaway ( 985110 ) writes: on Friday December 04, 2009 @09:01AM (#30322728)

ZFS.
Next topic....

Share
twitter facebook
- - Re: (Score:2)
    
    by tepples ( 727027 ) writes:
    
    Once the files are written to CD or tape you lose the advantage that hardware or filesystem protection gave you
    PAR2. Better?
  - Re: (Score:2)
    
    by xZgf6xHx2uhoAj9D ( 1160707 ) writes:
    
    Why? There's no reason a filesystem like ZFS can't be used on CD or tape and a lot of people do use them.
    Even if you didn't want to do that, ISO 9660, the filesystem used by default on data CDs, contains its own error correction scheme (288 bytes of redundancy for every 2048 byte block).
- - - Re: (Score:2)
      
      by jedidiah ( 1196 ) writes:
      
      > The Linux powers-that-be are the ones that chose to distribute their product under a license that can't be mixed with others.
      That was like... 25 years ago.
      That excuse doesn't really work well for anything released recently.
      No. It's the authors of newer works that choose not to "play nice".
What files does a single bit error destroy? (Score:3, Insightful)

by jmitchel!jmitchel.co ( 254506 ) writes: on Friday December 04, 2009 @09:02AM (#30322742)

What files does a single bit error irretrievably destroy? Obviously it may cause problems, even very annoying problems when you go to use the file. But unless that one bit is in a really bad spot that information is pretty recoverable.

Share
twitter facebook
- Re:What files does a single bit error destroy? (Score:4, Informative)
  
  by Jane Q. Public ( 1010737 ) writes: on Friday December 04, 2009 @09:09AM (#30322796)
  
  That's complete nonsense. Just for one example, if the bit is part of a numeric value, depending on where that bit is, it could make the number off anywhere from 1 to 2 BILLION or even a lot more, depending on the kind of representation being used.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2, Funny)
      
      by gzipped_tar ( 1151931 ) writes:
      
      Perhaps that is what the poster meant by "bad spot". If "Hitler" were altered to read as "Hatler", I'm pretty sure the meaning would still be clear from the context.
      Godvin.
      - Re: (Score:2)
        
        by Muad'Dave ( 255648 ) writes:
        
        I actually checked to see if 'v' and 'w' were different by only a single bit. * facepalm *
      - Re: (Score:2)
        
        by Raffaello ( 230287 ) writes:
        
        People like you are worse than Hatter!
        
        Re: (Score:2)
        
        by Raffaello ( 230287 ) writes:
        
        People like you are worse than matter!
    - - Re: (Score:2)
        
        by ae1294 ( 1547521 ) writes:
        
        Hrm.. You may have just explained the Nostradamus prediction of "Hister" instead of "Hitler". A single bit corruption in the "From the Future Media Stream" he was watching.
        Yes indeed... I blame the Large Hadron Collider for this.
- Re:What files does a single bit error destroy? (Score:5, Insightful)
  
  by Rockoon ( 1252108 ) writes: on Friday December 04, 2009 @09:21AM (#30322872)
  
  Most modern compression formats will not tolerate any errors. With LZ a single bit error could propagate over a long expanse of the uncompressed output, while with Arithmetic encoding the remainder of the file following the single bit error will be completely unrecoverable.
  
  Pretty much only the prefix-code style compression schemes (Huffman for one) will isolate errors to short sgements, and then only if the compressor is not of the adaptive variety.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by je ne sais quoi ( 987177 ) writes:
    
    Doesn't rar have a capacity to have some redundancy though? I seem to recall that downloading multi-part rar files from usenet a while ago that included some extra files that could be used to rebuild the originals in the case of file corruption (which happened fairly often with usenet).
  - Re: (Score:2)
    
    by Twinbee ( 767046 ) writes:
    
    With that in mind, what program would be best to use? I take it WinRAR (which uses RAR and zip compression), or 7-Zip would be of no use...?
- Re: (Score:2)
  
  by netsharc ( 195805 ) writes:
  
  I'd venture to say TrueCrypt containers, when that corruption occurs at the place where they store the encrypted symmetrical key. Depending on the size of said container it could be the whole harddisk. :)
  - Re: (Score:2)
    
    by mlts ( 1038732 ) * writes:
    
    Fortunately, newer versions of TC have two headers, so if the main one is scrozzled, you can check the "use backup header embedded in the volume if available" under Mount Options and have a second chance of mounting the volume.
- Re: (Score:2)
  
  by EllisDees ( 268037 ) writes:
  
  I've got a 10 gig .tar.bz2 file that I've only been partially able to recover due to a couple of bad blocks on a hard drive. I ran bzip2recover on it, which broke it into many, many pieces, and then put them back together into a partially recoverable tar file. Now I just can't figure out how to get past the corrupt pieces.:(
- I've lost jpgs that way (Score:2)
  
  by damn_registrars ( 1103043 ) writes:
  
  Many of the jpgs that I took with my first digital camera were damaged or destroyed to bit corruption. I doubt I'm the only person who fell to that problem; those pictures were taken back in the day when jpg was the only option available on cameras and many of us didn't know well enough to save it under a different file format afterwards. Now I have a collection of images where half the image is missing or disrupted - and many others that just simply don't open at all anymore.
- Re: (Score:2)
  
  by mlts ( 1038732 ) * writes:
  
  With modern compression and encryption algorithms, a single bit flipped can mean a *lot* of downstream corruption, especially in video that uses deltas, or encryption algorithms that are stream based, so all bits upstream have some effect as the file gets encrypted. A single bit flipped will render an encrypted file completely unusable.
Easy... (Score:3, Funny)

by realsilly ( 186931 ) writes: on Friday December 04, 2009 @09:05AM (#30322770)

Don't save anything.

Share
twitter facebook
What about the "block errors"? (Score:5, Informative)

by MathFox ( 686808 ) writes: on Friday December 04, 2009 @09:08AM (#30322792)

Most of the storage media in common use (disks, tapes, CD/DVD-R) already do use ECC at sector of block level and will fix "single bit" errors at firmware level transparently. What is more of an issue at application level are "missing block" errors; when the low-level ECC fails and the storage device signals "unreadable sector" and one or more blocks of data are lost.
Off course this can be fixed by "block redundancy" (like RAID does), "block recovery checksums" or old-fashioned backups.

Share
twitter facebook
- Re: (Score:2)
  
  by careysb ( 566113 ) writes:
  
  RAID 0 does not offer "block redundancy". If I have old-fashioned backups, how can I determine that my primary file has sustained damage? The OS doesn't tell us. In fact, the OS probably doesn't even know. Backup software often allows you to validate the backup immediately after the backup was made, but not, say, a year later. The term "checksum" is over-used. A true check-sum algorithm *may* tell you that a file has sustained damage. It will not, however, tell you how to correct it. A "CRC" (cyclic-redu
  - Re: (Score:2)
    
    by cloudmaster ( 10662 ) writes:
    
    Have you ever wondered why the numbers in RAID levels are what they are? RAID-1 is level 1 because it's like the identity property; everything's the same (kinda like multiplying by 1). RAID-0 is level 0 because it is not redundant; ie, there's zero redundency (as in, multiplying by 0). It's an array of inexpensive disks, ure, but calling RAID-0 a RAID is a misnomer at best. RAID-0 was not in the original RAID paper, in fact.
    No one talking about data protection uses RAID-0, and it's therefore irrelevant
- Re: (Score:2)
  
  by Prof.Phreak ( 584152 ) writes:
  
  It means we need error correction at every level---error correction at physical device (already in place, more or less) and error correction at file system level (so even if a few blocks from a file are missing, the file system auto-corrects itself and still functions---upto some point of course).
- - Re:What about the "block errors"? (Score:4, Informative)
    
    by tepples ( 727027 ) writes: <tepples AT gmail DOT com> on Friday December 04, 2009 @10:12AM (#30323264) Homepage Journal
    
    anyone know of the equivalent RAID model for things like tape?
    Four tapes data, one tape PAR2.
    
    Parent Share
    twitter facebook
About time (Score:3, Interesting)

by trydk ( 930014 ) writes: on Friday December 04, 2009 @09:10AM (#30322802)

It is about time that somebody (hopefully some of the commercial vendors AND the open source community too) get wise to the problems of digital storage.

I always create files with unique headers and consistent version numbering to allow for minor as well as major file format changes. For storage/exchange purposes, I make the format expandable where each subfield/record has an individual header with a field type and a length indicator. Each field is terminated with a unique marker (two NULL bytes) to make the format resilient to errors in the headers with possible resynchronisationthrough the markers. The format is in most situations backward compatible to a certain extent as an old program can always ignore fields/subfields it does not understand in a newer format file. If that is not an option, the major version number is incremented. This means that a version 2.11 program can read a version 2.34 file with only minor problems. It will not be able to write to that format, though. The same version 2.11 program would not be able to correctly read a version 3.01 file either.

I have not implemented ECC in the formats yet, but maybe the next time I do an overhaul ... I will have to ponder that. Maybe not, my programs seem to ephemeral for that ... Then again, so did people think about their 1960es COBOL programs.

Share
twitter facebook
Lossy (Score:2, Insightful)

by FlyingBishop ( 1293238 ) writes:

The participant asked why digital file formats (jpg, mpeg-3, mpeg-4, jpeg2000, and so on) can't allow the same degradation and remain viewable.

Because all of those are compressed, and take up a tiny fraction of the space that a faithful digital recording of the information on a film reel would take up. If you want lossless-level data integrity, use lossless formats for your masters.
Do not compress! (Score:2, Interesting)

by irp ( 260932 ) writes:

... Efficiency is the enemy of redundancy!
Old documents, saved in 'almost like ascii' is still 'readable'. I once salvaged a document from some obscure ancient word processor by opening it in a text editor. I also found some "images" (more like icons) on the same disk (a copy of a floppy), even these I could "read" (by changing the page width of my text editor to fit the width of the uncompressed image).
As long as the storage space keep growing...
- Re: (Score:2)
  
  by Hatta ( 162192 ) writes:
  
  Better than not compressing, is compressing and using the space you save for parity data.
Very, very old news.... (Score:3, Informative)

by gweihir ( 88907 ) writes: on Friday December 04, 2009 @09:18AM (#30322856)

It has been done like that for decades. Look at what archival tape does or DVDisaster or modern HDDs.
Also, this does not solve the problem, it just defers it. Why is this news?

Share
twitter facebook
Also, Bittorrent (Score:5, Informative)

by NoPantsJim ( 1149003 ) writes: on Friday December 04, 2009 @09:20AM (#30322862) Homepage

I remember reading a story of a guy who had to download a file from Apple that was over 4 gigabytes, and had to attempt it several times because each came back corrupted due to some problem with his internet. Eventually, he gave up and found the file on bit torrent, but realized if he saved it in the same location as the corrupted file, it would check the file and then overwrite it with the correct information. He was able to fix it in under an hour using bittorrent rather than trying to re-download the file while crossing his fingers and praying for no corruption.

I know it's not a perfect example, but just one way of looking at it.

Share
twitter facebook
- Re: (Score:2)
  
  by ShadowRangerRIT ( 1301549 ) writes:
  
  Of course, if a block level hash for the .torrent file had been corrupted when he downloaded that, we'd be back to square one.
  - Re: (Score:2)
    
    by NoPantsJim ( 1149003 ) writes:
    
    True, but at least it's worth considering.
    
    Wait, I know! We'll make a torrent for the .torrent file! Genius!
- Re: (Score:2)
  
  by 140Mandak262Jamuna ( 970587 ) writes:
  
  So you want to commit all the digital repositories to torrents for archival?
  - Re: (Score:2)
    
    by NoPantsJim ( 1149003 ) writes:
    
    I certainly wouldn't be the first to try...
Quickpar... (Score:2)

by blahplusplus ( 757119 ) writes:

Quite frankly data is so duplicated today bit-rot is not really an issue if you know what tools to use, especially if you use tools like quickpar on important data that can handle bad blocks.
Much data is easily duplicated, the data you want to save if it is important should be backed up with care.
Even though much of the data I download is easily downloaded again, the stuff I want to keep I quickpar the archives and burn to disc, and really important data that is irreplacable I make multiple copies.
http://ww [quickpar.co.uk]
Parchive: Parity Archive Volume Set (Score:5, Interesting)

by khundeck ( 265426 ) writes: on Friday December 04, 2009 @09:32AM (#30322938)

Parchive: Parity Archive Volume Set

It basically allows you to create an archive that's selectively larger, but contains an amount of parity such that you can have XX% corruption and still 'unzip.'

"The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet. We accomplished that goal." [http://parchive.sourceforge.net/]

KPH

Share
twitter facebook
Solution: (Score:3, Insightful)

by Lord Lode ( 1290856 ) writes: on Friday December 04, 2009 @09:36AM (#30322960)

Just don't compress anything, if a bit corrupts in a non compressed bitmap file or in a plain .txt file, no more than 1 pixel or letter is lost.

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by igny ( 716218 ) writes:
  
  II ccaann ssuuggeesstt eevveenn bbeetteerr iiddeeaa.
  II ccaann ssuuggeesstt eevveenn bbeetteerr iiddeeaa.
- - Re: (Score:2)
    
    by Lord Lode ( 1290856 ) writes:
    
    An XML file with a flipped bit can be restored just as well by a human being, than a painting of Rubens that is damaged a bit.
New versions of ISO, ZIP and Truecrypt for this? (Score:2)

by Brit_in_the_USA ( 936704 ) writes:

I would like this. Some options I could work with: Extensions to current CD/DVD/Bluray ISO formats, new version of "ZIP" files and a new version of True Crypt files.
If done in an open standards way I could be somewhat confident of support in many years time when I may need to read the archives. Obviously backwards compatibility with earlier iso/file formats would be a plus.
- Re: (Score:2)
  
  by flyingfsck ( 986395 ) writes:
  
  CD/DVD/etc have error correction already.
Film and digital (Score:3, Interesting)

by CXI ( 46706 ) writes: on Friday December 04, 2009 @09:38AM (#30322976) Homepage

Ten years ago my old company used to advocate that for individuals who wanted to convert paper to digital, they first put them on microfilm and then scan them. That way when their digital media got damaged or lost they could always recreate it. Film last for a long long time when stored correctly. Unfortunately that still seems the be the best advice, at least if you are starting from an analog original.

Share
twitter facebook
ZFS (Score:2)

by DiSKiLLeR ( 17651 ) writes:

Just use ZFS, its already been done.
kthxbai.
Cloud computing provides an opportunity (Score:4, Funny)

by davide marney ( 231845 ) * writes: on Friday December 04, 2009 @09:59AM (#30323120) Journal

As we're on the cusp of moving much of our data to the cloud, we've got the perfect opportunity to improve the resilience of information storage for a lot of people at the same time.

Share
twitter facebook
Forward-error correction instead (Score:2)

by kriston ( 7886 ) writes:

I believe that Forward-error correction is an even better model. Already used for error-free transmission of data over error-prone links in radio, and USENET using the PAR format, what better way to preserve data than with FEC?
Save your really precious files as Parchive files (PAR and PAR2). You can spread them over several discs or just one disc with several of the files on it.
It's one thing to detect errors, but it's a wholly different universe when you can also correct them.
http://en.wikipedia.org/wiki [wikipedia.org]
Linearity is the real problem (Score:2, Insightful)

by designlabz ( 1430383 ) writes:

Problem is not in error correction, but actually in linearity of data. Using only 256 pixels you could represent an image brain can interpret. Problem is, brain can not interpret an image form first 256 pixels, as that would probably be a line half long as the image width, consisting of mostly irrelevant data.
If I would want to make a fail proof image, I would split it to squares of, say, 9(3x3) pixels, and than put only central pixel(every 5th px) values in byte stream. Once that is done repeat that for s
- Re: (Score:2)
  
  by brusk ( 135896 ) writes:
  
  I know I could still read the book even if every fifth letter was replaced by a incorrect one.
  
  That would depend on the book. Your brain could probably error-correct a novel easily enough under those conditions (especially if it was every fifth character, and not random characters at a rate of 20%). But I doubt anyone could follow, say, a math textbook with that many errors.
zfec, Tahoe-LAFS (Score:2)

by Zooko ( 2210 ) writes:

zfec is much, much faster than par2: http://allmydata.org/trac/zfec
Tahoe-LAFS uses zfec, encryption, integrity checking based on SHA-256, digital signatures based on RSA, and peer-to-peer networking to take a bunch of hard disks and make them into a single virtual hard disk which is extremely robust: http://allmydata.org/trac/tahoe
Clay Tablets (Score:2)

by Ukab the Great ( 87152 ) writes:

Just resign yourself to the fact that the Code of Hammurabi [wikipedia.org] will outlive your pr0n.
Three Headers, Not Two (Score:2)

by dwye ( 1127395 ) writes:

> The solution proposed by the author: two headers and error correction code (ECC) in every file."
When there are two possibilities, which one do you chose? Three allows the software to have a vote among the headers, and ignore or correct the loser (assuming that there IS one, of course).
Also, keeping the headers in text, rather than using complicated encoding schemes to save space where it doesn't much matter, is probably a good idea, as well. Semantic sugar is your friend here.
- Re: (Score:3, Insightful)
  
  by ledow ( 319597 ) writes:
  
  Because no-one yet has ever managed to pull things from this theoretical "historical" layer without at least something like a electron microscope costing tens or hundreds of thousands, thousands of hours of skilled *manual* work and having to crack the damn harddrive open and destroy it (if at all)? I believe there is a still a challenge going around with a hard drive that was "zeroed" quite simply and if anyone can recover the password in the single file that was on it before it was zeroed, then can get a
- Re: (Score:2, Funny)
  
  by nomadic ( 141991 ) writes:
  
  You might be able to find some suggestions on how to fix that on Gopher.
- - Re: (Score:2)
    
    by Thiez ( 1281866 ) writes:
    
    That would like, totally be possible! Except of course that you would probably find that there are a _LOT_ of files of 20GB or less that would give you those exact same hashes. Some of those files will be a valid blu-ray file. But apart from that, way to go!
    It'd be interesting if someone where to do the math on this one, I think you'd be disappointed.
- - - Re: (Score:3, Informative)
      
      by TheThiefMaster ( 992038 ) writes:
      
      Asking for a definition of ecc [google.co.uk] turns it up, so it's obviously not that uncommon. And as we're talking about data corruption, it's the obvious one.
      Most IT techs would recognise the term from "ECC Ram", which is ram that is capable of correcting bit errors and is often required by server motherboards.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

To much reinvention (Score:5, Interesting)

Re: (Score:2)

Re:To much reinvention (Score:5, Insightful)

Re:To much reinvention (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Brave New World (Score:2)

Re: (Score:2)

Re:To much reinvention (Score:4, Insightful)

Re:To much reinvention (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:To much reinvention (Score:5, Insightful)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

par files (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Or just use PAR for your archives (Score:2)

It's that computer called the brain. (Score:5, Interesting)

Re:It's that computer called the brain. (Score:5, Interesting)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:It's that computer called the brain. (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Sun Microsystems..... zfs..... (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

What files does a single bit error destroy? (Score:3, Insightful)

Re:What files does a single bit error destroy? (Score:4, Informative)

Re: (Score:2, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:What files does a single bit error destroy? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I've lost jpgs that way (Score:2)

Re: (Score:2)

Easy... (Score:3, Funny)

What about the "block errors"? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:What about the "block errors"? (Score:4, Informative)

About time (Score:3, Interesting)

Lossy (Score:2, Insightful)

Do not compress! (Score:2, Interesting)

Re: (Score:2)

Very, very old news.... (Score:3, Informative)

Also, Bittorrent (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)