Meaningful MD5 Collisions 312
mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."
Okay, I'm impressed. (Score:5, Interesting)
I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...
Then I downloaded the files...
There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.
Collisions must be very easy to find! I am now offically very worried about this.
So you're saying I shouldn't implement MD5 ... (Score:3, Interesting)
in my next big project?
In all seriousness, I believe Schneier's right. We need a competition for a new hash function [schneier.com].
Nah, let's just wait for 24 [techweb.com] to drop the words "MD5" before we know it's really bad.
Relative Resources: The Attackers' Advantage (Score:5, Interesting)
From a prior discussion... (Score:4, Interesting)
There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.
Am I missing something or are these two stories relevant to each other?
Works for certificates, too (Score:5, Interesting)
Here's a link to the paper: Lenstra et al [iacr.org].
Re:Explanation of the attack -- enforcement issues (Score:2, Interesting)
It is an interesting attack, and IANAL, but I'd be curious about the legal ramifications. If I slip a carbon (ah... the way-back machine) in a stack of papers and ask someone to sign the top one without thus informing them, I think my stealth probably invalidates the additional document(s).
You could argue that there's a noticeable difference between pen and carbon -- making the copy hard to enforce -- but I'd argue the digital version is even easier: at least in the PS example, both "copies" of the document need to be present to preserve the hash.
In normal (pen/paper) signature situations, I get a copy of what I signed. The same ought to apply to digital sigs, resulting in a simple legal challenge to the validity of the document.
Re:Common sense (Score:4, Interesting)
So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.
The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).
Re:These are important attacks.. (Score:2, Interesting)
However, more symmetric relationships (such as a merger of two companies or even an independent contractor providing IT services to a business) usually have both sides exchanging documents back and forth and they eventually end up with a version that requires no further revision - so it's not possible to figure out (with JUST the two "versions" of the agreement with the same hash) which side "produced" the final copy (and hence, was in a position to orchestrate the second bogus version).
Am I missing something about the argument?
Digital Forensics (Score:1, Interesting)
Re:Wow...this is nerdy even for /. (Score:3, Interesting)
It bears mentioning that md5 doesn't account for the length of the file. So if someone were to try installing a backdoor into a program, and had a sophisticated enough piece of software using this method, comments, metadata, or other information could be used to 'pad out' the file to make it seem like the original -- even with source code files. Especially in the case of executables, they could just insert random crap at the end of an executable file, and make the md5 hash (and possibly the size) come out identical to the original. Some of these have already been demonstrated.
While this collision will be a big deal for signing documents, it shouldn't have any effect on web security (Digest Authentication, for one, uses MD5 pretty extensively). The lesson: While MD5 is still reasonably difficult to collide, the time to find collisions (~5 hours) on a normal PC means that malicious uses are now practical.
I'm not entirely sure what the implications are -- would it be suitable to sign documents using multiple cryptographic functions (such as a signature containing SHA, MD5, and CRC32 hashes, along with the original filesize)? Maybe perform a simple, arbitrary transformation on the text content and use that to generate a seperate, complimentary MD5 or SHA hash?
Jasin NataelRe:These are important attacks.. (Score:3, Interesting)
Microsoft Office and OpenOffice.org documents both can contain executable content which can execute when the document is opened, and which can alter the contents of the document.
I am not very familiar with Microsoft Office, but in OpenOffice.org, the default settings are to warn you when you open a document containing any macro code -- but the user can have turned off this feature.
I don't know about MS Office's binary format.
OpenOffice.org documents are simply Zip files of XML. (Yes, try renaming your OOo document's extention to ".zip" and unzipping it.) I know for a fact that I can take an OOo document written on Windows, move it to Linux, unzip it, and then re-zip it (using the "zip" command line tool) to get a smaller better compressed, but otherwise identical OOo document that opens in all versions of OOo. It may be possible to construct an OOo document that is a Zip, but where one or more zip file entries are completely UN-compressed, and therefore, where this technique could be used.