Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Security Encryption

Meaningful MD5 Collisions 312

mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."
This discussion has been archived. No new comments can be posted.

Meaningful MD5 Collisions

Comments Filter:
  • Okay, I'm impressed. (Score:5, Interesting)

    by ave19 ( 149657 ) on Friday June 10, 2005 @04:01PM (#12783348)
    At first I thought: Postscript! Well, obviously. To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide. If you tried to hide random data in a text file, it would be obvious to the person signing it. You need some format to hide the random bits from the viwer.

    I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

    Then I downloaded the files...

    There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.

    Collisions must be very easy to find! I am now offically very worried about this.
  • by malcomvetter ( 851474 ) on Friday June 10, 2005 @04:01PM (#12783351)


    in my next big project?

    In all seriousness, I believe Schneier's right. We need a competition for a new hash function [schneier.com].

    Nah, let's just wait for 24 [techweb.com] to drop the words "MD5" before we know it's really bad.
  • by G4from128k ( 686170 ) on Friday June 10, 2005 @04:02PM (#12783357)
    The core problem is that the resources of the defender are inevitably overwhelmed by the resources of the attacker.
    1. Relative expected values of gian vs. loss: The attacker thinks "I know I can gain a #BIG_NUM million dollars" and devotes their full effort to the attack. The defender thinks "I'm safe, there's a low probability, and I'm sure I'll catch the problem before it becomes real money, " and does not not devote effort to security becuase a Gartner report told him it was over-hyped. Thus, the attacker's perceived expected value is much higher than the defenders perceived expected loss and each invests accordingly.
    2. Rising Complexity: As IT systems become more complex, they become less secure. Each new device, new networking protocol, new physical layer, new OS feature, new networked application provides new opportunities for the attacker and a dilution of security resources for defenders.
    3. Time: The attacker has the advantage of time. New algorithms, new mathematical theories, new exploits, and faster processors all favor the attacker. What once was supposed to take the age of the universe to crack can be decrypted with a quickly declining number of networked (even zombied) PCs.
    4. Curse of Compatibility: Because so much crypto and security is networking related, it is subject to implementation delays caused by the need to be compatible. Defenders continue to use old, vulnerable systems to maintain compatibility with key partners. Patches don't solve the problem because the patch itself can introduce incompatibilities that make defenders leary of applying the patch with a very real chance of causing problems to avoid a hypothetical security issue.
    The bottom line is that the defender must protect all vulnerabilities while going about the day-to-day business of using the computer. In contrast, the attacker can devote full time to any weakness of their choice.
  • by erroneus ( 253617 ) on Friday June 10, 2005 @04:03PM (#12783390) Homepage
    I forget where or when exactly, so please feel free to run a search if you care to... it was here on Slashdot though.

    There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.

    Am I missing something or are these two stories relevant to each other?
  • by cpeikert ( 9457 ) <cpeikert AT alum DOT mit DOT edu> on Friday June 10, 2005 @04:04PM (#12783402) Homepage
    Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

    Here's a link to the paper: Lenstra et al [iacr.org].
  • by Eclectic Engineer ( 830396 ) on Friday June 10, 2005 @04:09PM (#12783457)

    It is an interesting attack, and IANAL, but I'd be curious about the legal ramifications. If I slip a carbon (ah... the way-back machine) in a stack of papers and ask someone to sign the top one without thus informing them, I think my stealth probably invalidates the additional document(s).

    You could argue that there's a noticeable difference between pen and carbon -- making the copy hard to enforce -- but I'd argue the digital version is even easier: at least in the PS example, both "copies" of the document need to be present to preserve the hash.

    In normal (pen/paper) signature situations, I get a copy of what I signed. The same ought to apply to digital sigs, resulting in a simple legal challenge to the validity of the document.

  • Re:Common sense (Score:4, Interesting)

    by iabervon ( 1971 ) on Friday June 10, 2005 @04:30PM (#12783718) Homepage Journal
    Actually, the two documents are actually almost identical. The difference is only one block in the whole file, which essentially acts as a selector for which of the two sets of content is displayed. MD5 (like most hash functions) works on fixed-size blocks smaller than the average file. To hash a complete file, you hash the first block, feed that into the hash with the second block, feed that into the hash with the third block, and so forth. So they have two files, and the first blocks are the same, the second blocks are different but hash the same, and the rest of the files are the same. Of course, the second blocks are junk, but the postscript is expecting a block-sized arbitrary value at that point anyway, so it doesn't matter that there's junk there.

    So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.

    The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).

  • by uncqual ( 836337 ) on Friday June 10, 2005 @05:48PM (#12784621)
    If I understand your point, in this case, one can reasonably assume the employer made the contract (it's unusual for employees to craft employment documents).

    However, more symmetric relationships (such as a merger of two companies or even an independent contractor providing IT services to a business) usually have both sides exchanging documents back and forth and they eventually end up with a version that requires no further revision - so it's not possible to figure out (with JUST the two "versions" of the agreement with the same hash) which side "produced" the final copy (and hence, was in a position to orchestrate the second bogus version).

    Am I missing something about the argument?

  • Digital Forensics (Score:1, Interesting)

    by Anonymous Coward on Friday June 10, 2005 @08:34PM (#12785907)
    This kind of information will wreak havoc with digital forensic departments. Right now we use various checksums, the most common being MD5, to verify the integrity and accuracy of a drive image/duplicate. Even if the chances of an accidental collision are so remote, the fact that people are researching and reporting on the fact that it can be done means proving this is a duplicate in court will be harder. Best case scenario, jury is shown reports like this, you then have reasonable doubt in the jury that the drive may NOT be an exact copy. Worst case scenario, the defence claims your lab doctored the results, and then they "prove" it with these algorithms and reports.
  • by Jasin Natael ( 14968 ) on Friday June 10, 2005 @09:19PM (#12786149)

    It bears mentioning that md5 doesn't account for the length of the file. So if someone were to try installing a backdoor into a program, and had a sophisticated enough piece of software using this method, comments, metadata, or other information could be used to 'pad out' the file to make it seem like the original -- even with source code files. Especially in the case of executables, they could just insert random crap at the end of an executable file, and make the md5 hash (and possibly the size) come out identical to the original. Some of these have already been demonstrated.

    While this collision will be a big deal for signing documents, it shouldn't have any effect on web security (Digest Authentication, for one, uses MD5 pretty extensively). The lesson: While MD5 is still reasonably difficult to collide, the time to find collisions (~5 hours) on a normal PC means that malicious uses are now practical.

    I'm not entirely sure what the implications are -- would it be suitable to sign documents using multiple cryptographic functions (such as a signature containing SHA, MD5, and CRC32 hashes, along with the original filesize)? Maybe perform a simple, arbitrary transformation on the text content and use that to generate a seperate, complimentary MD5 or SHA hash?

    Jasin Natael
  • by DickBreath ( 207180 ) on Saturday June 11, 2005 @03:17PM (#12790117) Homepage
    To do what these guys did, you need a format that can:
    • Contain "junk" data that is never displayed.
    • Contain two different "messages" only one of which will be displayed.
    • Be able to select which message to display based on the result of a calculation involving the junk.


    Microsoft Office and OpenOffice.org documents both can contain executable content which can execute when the document is opened, and which can alter the contents of the document.

    I am not very familiar with Microsoft Office, but in OpenOffice.org, the default settings are to warn you when you open a document containing any macro code -- but the user can have turned off this feature.

    I don't know about MS Office's binary format.

    OpenOffice.org documents are simply Zip files of XML. (Yes, try renaming your OOo document's extention to ".zip" and unzipping it.) I know for a fact that I can take an OOo document written on Windows, move it to Linux, unzip it, and then re-zip it (using the "zip" command line tool) to get a smaller better compressed, but otherwise identical OOo document that opens in all versions of OOo. It may be possible to construct an OOo document that is a Zip, but where one or more zip file entries are completely UN-compressed, and therefore, where this technique could be used.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...