One Way To Save Digital Archives From File Corruption

One Way To Save Digital Archives From File Corruption 257

Posted by timothy on Friday December 04, 2009 @08:50AM from the don'tcha-love-finding-corrupted-files dept.

storagedude points out this article about one of the perils of digital storage, the author of which "says massive digital archives are threatened by simple bit errors that can render whole files useless. The article notes that analog pictures and film can degrade and still be usable; why can't the same be true of digital files? The solution proposed by the author: two headers and error correction code (ECC) in every file."

One Way To Save Digital Archives From File Corruption

This discussion has been archived. No new comments can be posted.

Search 257 Comments Log In/Create an Account

Comments Filter:

Sun Microsystems..... zfs..... (Score:3, Insightful)

by HKcastaway ( 985110 ) writes: on Friday December 04, 2009 @09:01AM (#30322728)

ZFS.
Next topic....

What files does a single bit error destroy? (Score:3, Insightful)

by jmitchel!jmitchel.co ( 254506 ) writes: on Friday December 04, 2009 @09:02AM (#30322742)

What files does a single bit error irretrievably destroy? Obviously it may cause problems, even very annoying problems when you go to use the file. But unless that one bit is in a really bad spot that information is pretty recoverable.

Lossy (Score:2, Insightful)

by FlyingBishop ( 1293238 ) writes: on Friday December 04, 2009 @09:13AM (#30322826)

The participant asked why digital file formats (jpg, mpeg-3, mpeg-4, jpeg2000, and so on) can't allow the same degradation and remain viewable.

Because all of those are compressed, and take up a tiny fraction of the space that a faithful digital recording of the information on a film reel would take up. If you want lossless-level data integrity, use lossless formats for your masters.

Re:To much reinvention (Score:5, Insightful)

by paradxum ( 67051 ) writes: on Friday December 04, 2009 @09:14AM (#30322838)

It already exists, it's called ZFS on solaris boxxen. Each block uses ECC, it can correct itself on each read, and generally can indicate a failing disk. This truly is the filesystem every other one is playing catchup with.

Re:What files does a single bit error destroy? (Score:5, Insightful)

by Rockoon ( 1252108 ) writes: on Friday December 04, 2009 @09:21AM (#30322872)

Most modern compression formats will not tolerate any errors. With LZ a single bit error could propagate over a long expanse of the uncompressed output, while with Arithmetic encoding the remainder of the file following the single bit error will be completely unrecoverable.

Pretty much only the prefix-code style compression schemes (Huffman for one) will isolate errors to short sgements, and then only if the compressor is not of the adaptive variety.

Solution: (Score:3, Insightful)

by Lord Lode ( 1290856 ) writes: on Friday December 04, 2009 @09:36AM (#30322960)

Just don't compress anything, if a bit corrupts in a non compressed bitmap file or in a plain .txt file, no more than 1 pixel or letter is lost.

Re:What about the "block errors"? (Score:1, Insightful)

by Anonymous Coward writes: on Friday December 04, 2009 @09:44AM (#30323010)

...two tapes.

Re:To much reinvention (Score:5, Insightful)

by MrNaz ( 730548 ) * writes: on Friday December 04, 2009 @09:45AM (#30323018) Homepage

Ahem. RAID anyone? ZFS? Btrfs? Hello?
Isn't this what filesystem devs have been concentrating on for about 5 years now?

Re:To much reinvention (Score:3, Insightful)

by Interoperable ( 1651953 ) writes: on Friday December 04, 2009 @09:52AM (#30323076)

I agree that filesystem level error correction is good idea. Having the option to specify ECC options for a given file or folder would be great functionality to have. The idea presented in this article, however, is that certain compressed formats don't need ECC for the entire file. Instead, as long as the headers are intact, a few bits here or there will result in only some distortion; not a big deal if it's just vacation photos/movies.
By only having ECC in the headers, you would save a good deal of storage space and processing time. It wouldn't need to be supported in every application either, just the codecs. Individual codecs could include it fairly easily as they release new versions, which wouldn't be backward compatible anyway so you don't introduce a new problem. I think it's a good idea, it would keep media readable with very little overhead, just a few odd pixels during playback even in a corrupted file.

Re:To much reinvention (Score:4, Insightful)

by bertok ( 226922 ) writes: on Friday December 04, 2009 @10:06AM (#30323188)

If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.
Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.
Precisely. This is what things like torrents, RAR files with recovery blocks, and filesystems like ZFS are for: so every app developer doesn't have to roll their own, badly.

Linearity is the real problem (Score:2, Insightful)

by designlabz ( 1430383 ) writes: on Friday December 04, 2009 @10:11AM (#30323238)

Problem is not in error correction, but actually in linearity of data. Using only 256 pixels you could represent an image brain can interpret. Problem is, brain can not interpret an image form first 256 pixels, as that would probably be a line half long as the image width, consisting of mostly irrelevant data.
If I would want to make a fail proof image, I would split it to squares of, say, 9(3x3) pixels, and than put only central pixel(every 5th px) values in byte stream. Once that is done repeat that for surrounding pixels in the block. In that way, even if part of data is lost, program would have at least one of the pixels in a 9x9 block and it could use one of nearby pixels as a substitute, leaving up to person to try and figure out the data. You could repeat subdivision once again, achieving pseudo random order of bytes.
And this is just a mock up of what could be done to improve data safety in images without increasing the actual file size.
In old days of internet, designers were using images in lower resolution, to lower page loading time, and than gradually exchanging images with higher res versions once those loaded. If it had sense to do it then, maybe we could now use integrated preview images to represent the average sector of pixels in the image, and than reverse calculate missing ones using pixels we have.
This could also work for audio files, and maybe even archives. I know I could still read the book even if every fifth letter was replaced by a incorrect one.

Cheers,
DLabz

Re:It's that computer called the brain. (Score:3, Insightful)

by ILongForDarkness ( 1134931 ) writes: on Friday December 04, 2009 @10:13AM (#30323276)

And how well did that work for your last corrupted text file? Or a printer job that the printer didn't know how to handle? My guess you could pick out a few words and the rest was random garble. The mind is good at filtering out noise but it is an intrinsically hard problem to do a similar thing with a computer. Say a random bit is missed and the whole file ends up shifted one to the left, how does the computer know that the combinations of pixel values it is displaying should start one bit out of sync so that the still existing data "looks" good? Similarly with a text file, all the remaining bits could be valid characters, how is a computer to know what characters to show other than having the correct data?

Re:Incorrect... (Score:3, Insightful)

by ledow ( 319597 ) writes: on Friday December 04, 2009 @10:50AM (#30323644) Homepage

Because no-one yet has ever managed to pull things from this theoretical "historical" layer without at least something like a electron microscope costing tens or hundreds of thousands, thousands of hours of skilled *manual* work and having to crack the damn harddrive open and destroy it (if at all)? I believe there is a still a challenge going around with a hard drive that was "zeroed" quite simply and if anyone can recover the password in the single file that was on it before it was zeroed, then can get a few thousand dollars - nobody has even done more than look at it yet. (It certainly can't be done by software alone - are you thinking of unzeroed filesystem residue that has nothing to do with hardware at all?)
In theory you might think you were right, but digital is nothing to do with historical layering (which is doubtful whether it exists in a practical sense that can be utilised)... it's the method of recording - 1 or 0 or more possible patterns? Hard drives might store by majority by they do it for a reason - because a single bit it *useless* on such a fine recording medium because it *can* change over time or just by slight inaccuracies in the recording/reading methods, so you have to swipe a whole bunch of the disk to be assured of reading back a 1 or 0 with your reader (which could never read more than the consensus of 1 or 0 because it's just not that accurate - it has to have a large bunch of magnetised particles to make any reading at all, it doesn't read each individually and then think "Oh, that's enough to be a 1" - when it reads it back, only a certain amount "trigger" it to think the thing is a 0 or 1 - thus it *IS* digital because the only answer it can give is 0 or 1 and not "well, almost a 1").
And if manufacturers thought for a second any of that was do-able in even enterprise drives, it would be done already and sold to the highest bidder. The fact is that it just isn't feasible or even possible - it's almost impossible to do that in a device small enough to fit in your car, or reliably, or without totally destroying the operation or performance of a drive, or for less than the price of a large rack full of storage.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

One Way To Save Digital Archives From File Corruption 257

One Way To Save Digital Archives From File Corruption More Login

One Way To Save Digital Archives From File Corruption

Sun Microsystems..... zfs..... (Score:3, Insightful)

What files does a single bit error destroy? (Score:3, Insightful)

Lossy (Score:2, Insightful)

Re:To much reinvention (Score:5, Insightful)

Re:What files does a single bit error destroy? (Score:5, Insightful)

Solution: (Score:3, Insightful)

Re:What about the "block errors"? (Score:1, Insightful)

Re:To much reinvention (Score:5, Insightful)

Re:To much reinvention (Score:3, Insightful)

Re:To much reinvention (Score:4, Insightful)

Linearity is the real problem (Score:2, Insightful)

Re:It's that computer called the brain. (Score:3, Insightful)

Re:Incorrect... (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot