Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage Media IT

One Way To Save Digital Archives From File Corruption 257

storagedude points out this article about one of the perils of digital storage, the author of which "says massive digital archives are threatened by simple bit errors that can render whole files useless. The article notes that analog pictures and film can degrade and still be usable; why can't the same be true of digital files? The solution proposed by the author: two headers and error correction code (ECC) in every file."
This discussion has been archived. No new comments can be posted.

One Way To Save Digital Archives From File Corruption

Comments Filter:
  • To much reinvention (Score:5, Interesting)

    by DarkOx ( 621550 ) on Friday December 04, 2009 @07:56AM (#30322692) Journal

    If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.
        Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.

    • What we are talking about here is how to add more redundancy on the software level.. but honestly..

      ...why not do it at the hardware level where there is already redundancy, and cant be fucked up by an additional error vector?
      • by MrNaz ( 730548 ) * on Friday December 04, 2009 @08:45AM (#30323018) Homepage

        Ahem. RAID anyone? ZFS? Btrfs? Hello?

        Isn't this what filesystem devs have been concentrating on for about 5 years now?

        • by An dochasac ( 591582 ) on Friday December 04, 2009 @09:50AM (#30323632)

          "says massive digital archives are threatened by simple bit errors that can render whole files useless.
          Isn't this what filesystem devs have been concentrating on for about 5 years now?

          Not just 5 years. ZFS's CRC on every datablock and Raid Z (no raid hold) are innovative and obviously the next step in filesystem evolution. But attempts at redundancy aren't new. I'm surprised the article is discussing relatively low teck old hat ideas such as two filesystem headers. Even DOS's FAT used this raid0 type of brute force redundancy by having two FAT tables. The Commodore Amiga's Intuition filesystem did this better than Microsoft back in 1985 by having forward and backward links in every block which made it possible to repair block pointer damage by searching for a reference to the bad block in the preceding and following block.
          And I suppose if ZFS doesn't catch on, 25 or 30 years from now Apple or Microsoft will finally come up with it and say, "Hey look what we invented!"

        • by Bengie ( 1121981 )

          Not all RAID implementations check for errors. Some cheaper hardware will duplicate data or make parity, but never check for corruption. Instead, they only use the duplicated data for recovery purposes. Nothing says fun like rebuilding a RAID drive only to find your your parity data was corrupt, but you won't know that until you try to use your new drive and weird things happen..

          Will the corruption affects your FS or will it affect your data.. 8-ball says........

        • RAID is not backup or archive. If you have a RAID1 system with bit errors on one disk - you now have them on the other disk.

          Using more complex RAID configs does not necessarily solve the problem.

          With archiving, the problem becomes apparent after you pull out your media after 7 years of not using it. Is that parallel ATA hard drive you stored it on still good? Do you have a connection for it? What about those zip and jazz drives you used? Do you have a method of reading them? Are those DVDs and CDs still goo

        • Re: (Score:3, Interesting)

          by Rockoon ( 1252108 )
          File Systems are in the software domain. If you arent getting good data (what was written) off the drive, the File System ideally shouldn't be able to do any better than the hardware did with the data. Of course, in reality the hardware uses a fixed redundancy model that offers less reliability than some people like. The danger of software-based solutions is that it allows hardware manufacturers to offer even less redundancy, or even NO redundancy at all, causing a need for even MORE software based redundan
    • Re: (Score:3, Insightful)

      I agree that filesystem level error correction is good idea. Having the option to specify ECC options for a given file or folder would be great functionality to have. The idea presented in this article, however, is that certain compressed formats don't need ECC for the entire file. Instead, as long as the headers are intact, a few bits here or there will result in only some distortion; not a big deal if it's just vacation photos/movies.

      By only having ECC in the headers, you would save a good deal of storage

      • "Ending is better than mending".
        Consumers should welcome file corruption; it's a chance to throw away those old files and buy some brand new ones instead :)

        Actually, I would not be surprised if the media companies were busily trying to invent a self-corrupting DRM format to replace DVDs and suchlike.
        • by dwye ( 1127395 )

          > Actually, I would not be surprised if the media companies were
          > busily trying to invent a self-corrupting DRM format to replace
          > DVDs and suchlike.

          Any physical medium is automatically self-corrupting. Leave a DVD on the back shelf of a car for a summer afternoon if you don't believe that.

    • by bertok ( 226922 ) on Friday December 04, 2009 @09:06AM (#30323188)

      If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.

          Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.

      Precisely. This is what things like torrents, RAR files with recovery blocks, and filesystems like ZFS are for: so every app developer doesn't have to roll their own, badly.

    • A simple solution would be to add a file full of redundancy data alongside the original on the archival media. A simple application could be used to repair the file if it becomes damaged, or test it for damage before you go to use it, but the original format of the file remains unchanged, and your recovery system is file system agnostic.

    • How about an archive format, essentially a zip file, which contains additional headers for all of it's contents? Something like a manifest would work well. It could easily be an XML file with the header information and other meta-data about the contents of the archive. This way you get a good compromise between having to store an entire filesystem and the overhead of putting this information in each file.

      Add another layer by using a striped data format with parity and you have the ability to reconstruct any

    • Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.

      Not to nitpick, but I'm going to nitpick...

      Parity bits don't allow you to correct an error, only check that an error is present. This is useful for transfer protocols, where the system knows to ask for the file again, but the final result on an archived file is the same: file corrupt. With parity, though, you have an extra bit that can be corrupted.

      In other words, parity is an error check system, what we need is ECC (Error Check and Correct). Single parity bit is a 1-bit check, 0-bit correct. we need

      • Sorry, messed up my abbreviations.
        We want Error Detect and Correct.
        ECC is an Error Correcting Code.

  • par files (Score:5, Informative)

    by ionix5891 ( 1228718 ) on Friday December 04, 2009 @07:58AM (#30322704)

    include par2 files

    • I'm glad I didn't have to scroll down too far to see this. They're practically magic.

  • Done. +1 to the poster who said there is some round transportation implement being reinvented here.

  • by commodore64_love ( 1445365 ) on Friday December 04, 2009 @08:00AM (#30322724) Journal

    >>>"...analog pictures and film can degrade and still be usable; why can't the same be true of digital files?"

    The ear-eye-brain connection has ~500 million years of development, and has learned the ability to filter-out noise. If for example I'm listening to a radio, the hiss is mentally filtered-out, or if I'm watching a VHS tape that has wrinkles, my brain can focus on the undamaged areas. In contrast when a computer encounters noise or errors, it panics and says, "I give up," and the digital radio or digital television goes blank.

    What we need is a smarter computer that says, "I don't know what this is supposed to be, but here's my best guess," and displays noise. Let the brain then takeover and mentally remove the noise from the audio or image.

    • by commodore64_love ( 1445365 ) on Friday December 04, 2009 @08:09AM (#30322794) Journal

      P.S.

      When I was looking for a digital-to-analog converter for my TV, I returned all the ones that displayed blank screens when the signal became weak. The one I eventually chose (x5) was the Channel Master unit. When the signal is weak it continues displaying a noisy image, rather than go blank, or it reverts to "audio only" mode, rather than go silent. It lets me continue watching programs rather than be completely cutoff.

    • Re: (Score:3, Insightful)

      And how well did that work for your last corrupted text file? Or a printer job that the printer didn't know how to handle? My guess you could pick out a few words and the rest was random garble. The mind is good at filtering out noise but it is an intrinsically hard problem to do a similar thing with a computer. Say a random bit is missed and the whole file ends up shifted one to the left, how does the computer know that the combinations of pixel values it is displaying should start one bit out of sync so t
      • >>>And how well did that work for your last corrupted text file?

        Extremely well. I read all 7 Harry Potter books using a corrupted OCR scan. From time to time the words might read "Harry Rotter pinted his wand at the eneny," but I still understood what it was supposed to be. My brain filtered-out the errors. That corrupted version is better than the computer saying, "I give up" and displaying nothing at all.
        .

        >>>Say a random bit is missed and the whole file ends up shifted one to the lef

    • Sure, but you'd need to not compress anything and introduce redundancy. Most people prefer using efficient formats and doing error checks and corrections elsewhere (in the filesystem, by external correction codes - par2 springs to mind, etc).

      Here's a simple image format that will survive bit flip style corruption:

      Every pixel is stored as with 96 bits, the first 32 is the width of the image, the next 32 is the height, the next 8 bits is the R, the next 8 bits is the G, the next 8 bits is the B, and the last

    • by Phreakiture ( 547094 ) on Friday December 04, 2009 @09:26AM (#30323388) Homepage

      What we need is a smarter computer that says, "I don't know what this is supposed to be, but here's my best guess," and displays noise. Let the brain then takeover and mentally remove the noise from the audio or image.

      Audio CDs have always done this. Audio CDs are also uncompressed*.

      The problem, I suspect, is that we have come to rely on a lot of data compression, particularly where video is concerned. I'm not saying this is the wrong choice, necessarily, because video can become ungodly huge without it (NTSC SD video -- 720 x 480 x 29.97 -- in the 4:2:2 colour space, 8 bits per pixel per plane, will consume 69.5 GiB an hour without compression), but maybe we didn't give enough thought to stream corruption.

      Mini DV video tape, when run in SD, uses no compression on the audio, and the video is only lightly compressed, using a DCT-based codec, with no delta coding. In practical terms, what this means is that one corrupted frame of video doesn't cascade into future frames. If my camcorder gets a wrinkle in the tape, it will affect the frames recorded on the wrinkle, and no others. It also makes a best-guess effort to reconstruct the frame. This task may not be impossible with more dense codecs that do use delta coding and motion compensation (MPEG, DiVX, etc), but it is certainly made far more difficult.

      Incidentally, even digital cinemas are using compression. It is a no-delta compression, but the individual frames are compressed in a manner akin to JPEGs, and the audio is compressed either using DTS or AC3 or one of their variants in most cinemas. The difference, of course, is that the cinemas must provide a good presentation. If they fail to do so, people will stop coming. If the presentation isn't better than watching TV/DVD/BluRay at home, then why pay the $11?

      (* I refer here to data compression, not dynamic range compression. Dynamic range compression is applied way too much in most audio media)

      • Comment removed based on user account deletion
      • by stevew ( 4845 )

        You're right about CD's always having contained this type of ECC mechanism. For that matter you will see this type of ECC in Radio based communications infrastructure and data that gets written to Hard disks too! In other words - all modern data storage devices (except maybe Flash..) contain ECC mechanisms that allow burst error detection and correction.

        So now we're talking about being doubly redundant. Put ECC on the ECC? I'm not sure that helps.

        Consider - if a bit twiddles on a magnetic domain. It wi

        • I don't see the point.

          Having dealt with corrupted DVDs, I do.

          You could always put the redundancy data someplace else entirely, too. Realistically, this is an extension of the RAID concept.

        • One thing that helps is to specifically look at damage mechanisms and then come up with a strategy that offers the best workaround. As an example, in the first-generation TDMA digital cellular phone standard, it was recognized that channel noise was typically bursty. You could lose much of one frame of data, but adjacent frames would be OK. So they encoded important data with a 4:1 forward error correction scheme, and then interleaved the encoded data over two frames. If you lost a frame, the de-interleavin
      • If the presentation isn't better than watching TV/DVD/BluRay at home, then why pay the $11?

        To get away from the kids? To make out in the back row?

        Seriously people dont go to the movie theatre for the movies mostly.

        Also VLC tends to do best guess of the frame when there is corruption, it's usually a pretty bad guess but it still tries.

    • This was extremely useful in the 1980s when certain television channels were available from Sweden and Holland but only in a scrambled form.

    • by Bengie ( 1121981 )

      but how much of our ability to "read through" noise is because we know what data to expect in the first place?

      I know when listening to a radio station that's barely coming in, many times it will sound like random noise, but at some point I will hear a certain note or something and suddenly I know what song it is. Now that I know what song it is, I have no problems actually "hearing" the song. I know what to expect from the song or have picked up on certain cues and I'm guessing that pattern recognition goes

    • by Kjella ( 173770 )

      That would be trivial to do if we were still doing BMP and WAV files, where one bit = one speck of noise. But on the file/network level we use a ton of compression, and the result is that a bit error isn't a bit error in the human sense. Bits are part of a block, and other blocks depend on that block. One change and everything that comes after until the next key frame changes. That means two vastly different results are suddenly a bit apart.

      Of course, no actual medium is that stable which is why we use erro

  • by HKcastaway ( 985110 ) on Friday December 04, 2009 @08:01AM (#30322728)

    ZFS.

    Next topic....

  • by jmitchel!jmitchel.co ( 254506 ) on Friday December 04, 2009 @08:02AM (#30322742)
    What files does a single bit error irretrievably destroy? Obviously it may cause problems, even very annoying problems when you go to use the file. But unless that one bit is in a really bad spot that information is pretty recoverable.
    • by Jane Q. Public ( 1010737 ) on Friday December 04, 2009 @08:09AM (#30322796)
      That's complete nonsense. Just for one example, if the bit is part of a numeric value, depending on where that bit is, it could make the number off anywhere from 1 to 2 BILLION or even a lot more, depending on the kind of representation being used.
    • by Rockoon ( 1252108 ) on Friday December 04, 2009 @08:21AM (#30322872)
      Most modern compression formats will not tolerate any errors. With LZ a single bit error could propagate over a long expanse of the uncompressed output, while with Arithmetic encoding the remainder of the file following the single bit error will be completely unrecoverable.

      Pretty much only the prefix-code style compression schemes (Huffman for one) will isolate errors to short sgements, and then only if the compressor is not of the adaptive variety.
      • Doesn't rar have a capacity to have some redundancy though? I seem to recall that downloading multi-part rar files from usenet a while ago that included some extra files that could be used to rebuild the originals in the case of file corruption (which happened fairly often with usenet).
      • by Twinbee ( 767046 )

        With that in mind, what program would be best to use? I take it WinRAR (which uses RAR and zip compression), or 7-Zip would be of no use...?

    • I'd venture to say TrueCrypt containers, when that corruption occurs at the place where they store the encrypted symmetrical key. Depending on the size of said container it could be the whole harddisk. :)

      • by mlts ( 1038732 ) *

        Fortunately, newer versions of TC have two headers, so if the main one is scrozzled, you can check the "use backup header embedded in the volume if available" under Mount Options and have a second chance of mounting the volume.

    • I've got a 10 gig .tar.bz2 file that I've only been partially able to recover due to a couple of bad blocks on a hard drive. I ran bzip2recover on it, which broke it into many, many pieces, and then put them back together into a partially recoverable tar file. Now I just can't figure out how to get past the corrupt pieces.:(

    • Many of the jpgs that I took with my first digital camera were damaged or destroyed to bit corruption. I doubt I'm the only person who fell to that problem; those pictures were taken back in the day when jpg was the only option available on cameras and many of us didn't know well enough to save it under a different file format afterwards. Now I have a collection of images where half the image is missing or disrupted - and many others that just simply don't open at all anymore.
    • by mlts ( 1038732 ) *

      With modern compression and encryption algorithms, a single bit flipped can mean a *lot* of downstream corruption, especially in video that uses deltas, or encryption algorithms that are stream based, so all bits upstream have some effect as the file gets encrypted. A single bit flipped will render an encrypted file completely unusable.

  • Easy... (Score:3, Funny)

    by realsilly ( 186931 ) on Friday December 04, 2009 @08:05AM (#30322770)

    Don't save anything.

  • by MathFox ( 686808 ) on Friday December 04, 2009 @08:08AM (#30322792)
    Most of the storage media in common use (disks, tapes, CD/DVD-R) already do use ECC at sector of block level and will fix "single bit" errors at firmware level transparently. What is more of an issue at application level are "missing block" errors; when the low-level ECC fails and the storage device signals "unreadable sector" and one or more blocks of data are lost.

    Off course this can be fixed by "block redundancy" (like RAID does), "block recovery checksums" or old-fashioned backups.

    • by careysb ( 566113 )
      RAID 0 does not offer "block redundancy". If I have old-fashioned backups, how can I determine that my primary file has sustained damage? The OS doesn't tell us. In fact, the OS probably doesn't even know. Backup software often allows you to validate the backup immediately after the backup was made, but not, say, a year later. The term "checksum" is over-used. A true check-sum algorithm *may* tell you that a file has sustained damage. It will not, however, tell you how to correct it. A "CRC" (cyclic-redu
      • Have you ever wondered why the numbers in RAID levels are what they are? RAID-1 is level 1 because it's like the identity property; everything's the same (kinda like multiplying by 1). RAID-0 is level 0 because it is not redundant; ie, there's zero redundency (as in, multiplying by 0). It's an array of inexpensive disks, ure, but calling RAID-0 a RAID is a misnomer at best. RAID-0 was not in the original RAID paper, in fact.

        No one talking about data protection uses RAID-0, and it's therefore irrelevant

    • It means we need error correction at every level---error correction at physical device (already in place, more or less) and error correction at file system level (so even if a few blocks from a file are missing, the file system auto-corrects itself and still functions---upto some point of course).

  • About time (Score:3, Interesting)

    by trydk ( 930014 ) on Friday December 04, 2009 @08:10AM (#30322802)
    It is about time that somebody (hopefully some of the commercial vendors AND the open source community too) get wise to the problems of digital storage.

    I always create files with unique headers and consistent version numbering to allow for minor as well as major file format changes. For storage/exchange purposes, I make the format expandable where each subfield/record has an individual header with a field type and a length indicator. Each field is terminated with a unique marker (two NULL bytes) to make the format resilient to errors in the headers with possible resynchronisationthrough the markers. The format is in most situations backward compatible to a certain extent as an old program can always ignore fields/subfields it does not understand in a newer format file. If that is not an option, the major version number is incremented. This means that a version 2.11 program can read a version 2.34 file with only minor problems. It will not be able to write to that format, though. The same version 2.11 program would not be able to correctly read a version 3.01 file either.

    I have not implemented ECC in the formats yet, but maybe the next time I do an overhaul ... I will have to ponder that. Maybe not, my programs seem to ephemeral for that ... Then again, so did people think about their 1960es COBOL programs.
  • Lossy (Score:2, Insightful)

    The participant asked why digital file formats (jpg, mpeg-3, mpeg-4, jpeg2000, and so on) can't allow the same degradation and remain viewable.

    Because all of those are compressed, and take up a tiny fraction of the space that a faithful digital recording of the information on a film reel would take up. If you want lossless-level data integrity, use lossless formats for your masters.

  • Do not compress! (Score:2, Interesting)

    by irp ( 260932 )

    ... Efficiency is the enemy of redundancy!

    Old documents, saved in 'almost like ascii' is still 'readable'. I once salvaged a document from some obscure ancient word processor by opening it in a text editor. I also found some "images" (more like icons) on the same disk (a copy of a floppy), even these I could "read" (by changing the page width of my text editor to fit the width of the uncompressed image).

    As long as the storage space keep growing...

    • by Hatta ( 162192 )

      Better than not compressing, is compressing and using the space you save for parity data.

  • by gweihir ( 88907 ) on Friday December 04, 2009 @08:18AM (#30322856)

    It has been done like that for decades. Look at what archival tape does or DVDisaster or modern HDDs.

    Also, this does not solve the problem, it just defers it. Why is this news?

  • Also, Bittorrent (Score:5, Informative)

    by NoPantsJim ( 1149003 ) on Friday December 04, 2009 @08:20AM (#30322862) Homepage
    I remember reading a story of a guy who had to download a file from Apple that was over 4 gigabytes, and had to attempt it several times because each came back corrupted due to some problem with his internet. Eventually, he gave up and found the file on bit torrent, but realized if he saved it in the same location as the corrupted file, it would check the file and then overwrite it with the correct information. He was able to fix it in under an hour using bittorrent rather than trying to re-download the file while crossing his fingers and praying for no corruption.

    I know it's not a perfect example, but just one way of looking at it.
  • Quite frankly data is so duplicated today bit-rot is not really an issue if you know what tools to use, especially if you use tools like quickpar on important data that can handle bad blocks.

    Much data is easily duplicated, the data you want to save if it is important should be backed up with care.

    Even though much of the data I download is easily downloaded again, the stuff I want to keep I quickpar the archives and burn to disc, and really important data that is irreplacable I make multiple copies.

    http://ww [quickpar.co.uk]

  • by khundeck ( 265426 ) on Friday December 04, 2009 @08:32AM (#30322938)
    Parchive: Parity Archive Volume Set

    It basically allows you to create an archive that's selectively larger, but contains an amount of parity such that you can have XX% corruption and still 'unzip.'

    "The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet. We accomplished that goal." [http://parchive.sourceforge.net/]

    KPH
  • Solution: (Score:3, Insightful)

    by Lord Lode ( 1290856 ) on Friday December 04, 2009 @08:36AM (#30322960)
    Just don't compress anything, if a bit corrupts in a non compressed bitmap file or in a plain .txt file, no more than 1 pixel or letter is lost.
    • Re: (Score:3, Funny)

      by igny ( 716218 )
      II ccaann ssuuggeesstt eevveenn bbeetteerr iiddeeaa.
      II ccaann ssuuggeesstt eevveenn bbeetteerr iiddeeaa.
  • I would like this. Some options I could work with: Extensions to current CD/DVD/Bluray ISO formats, new version of "ZIP" files and a new version of True Crypt files.
    If done in an open standards way I could be somewhat confident of support in many years time when I may need to read the archives. Obviously backwards compatibility with earlier iso/file formats would be a plus.
  • Film and digital (Score:3, Interesting)

    by CXI ( 46706 ) on Friday December 04, 2009 @08:38AM (#30322976) Homepage
    Ten years ago my old company used to advocate that for individuals who wanted to convert paper to digital, they first put them on microfilm and then scan them. That way when their digital media got damaged or lost they could always recreate it. Film last for a long long time when stored correctly. Unfortunately that still seems the be the best advice, at least if you are starting from an analog original.
  • Just use ZFS, its already been done.

    kthxbai.

  • by davide marney ( 231845 ) * on Friday December 04, 2009 @08:59AM (#30323120) Journal

    As we're on the cusp of moving much of our data to the cloud, we've got the perfect opportunity to improve the resilience of information storage for a lot of people at the same time.

  • I believe that Forward-error correction is an even better model. Already used for error-free transmission of data over error-prone links in radio, and USENET using the PAR format, what better way to preserve data than with FEC?
    Save your really precious files as Parchive files (PAR and PAR2). You can spread them over several discs or just one disc with several of the files on it.

    It's one thing to detect errors, but it's a wholly different universe when you can also correct them.

    http://en.wikipedia.org/wiki [wikipedia.org]

  • Problem is not in error correction, but actually in linearity of data. Using only 256 pixels you could represent an image brain can interpret. Problem is, brain can not interpret an image form first 256 pixels, as that would probably be a line half long as the image width, consisting of mostly irrelevant data.
    If I would want to make a fail proof image, I would split it to squares of, say, 9(3x3) pixels, and than put only central pixel(every 5th px) values in byte stream. Once that is done repeat that for s
    • by brusk ( 135896 )

      I know I could still read the book even if every fifth letter was replaced by a incorrect one.

      That would depend on the book. Your brain could probably error-correct a novel easily enough under those conditions (especially if it was every fifth character, and not random characters at a rate of 20%). But I doubt anyone could follow, say, a math textbook with that many errors.

  • zfec is much, much faster than par2: http://allmydata.org/trac/zfec

    Tahoe-LAFS uses zfec, encryption, integrity checking based on SHA-256, digital signatures based on RSA, and peer-to-peer networking to take a bunch of hard disks and make them into a single virtual hard disk which is extremely robust: http://allmydata.org/trac/tahoe

  • Just resign yourself to the fact that the Code of Hammurabi [wikipedia.org] will outlive your pr0n.

  • > The solution proposed by the author: two headers and error correction code (ECC) in every file."

    When there are two possibilities, which one do you chose? Three allows the software to have a vote among the headers, and ignore or correct the loser (assuming that there IS one, of course).

    Also, keeping the headers in text, rather than using complicated encoding schemes to save space where it doesn't much matter, is probably a good idea, as well. Semantic sugar is your friend here.

8 Catfish = 1 Octo-puss

Working...