Forgot your password?
typodupeerror
Security Encryption Math

MD5 Collision Source Code Released 411

Posted by Zonk
from the collisiontacular dept.
SiliconEntity writes "The crypto world was shaken to its roots last year with the announcement of a new algorithm to find collisions in the still widely-used MD5 hash algorithm. Despite considerable work and commentary since then, no source code for finding such collisions has been published. Until today! Patrick Stach has announced the availability of his source code for finding MD5 collisions and MD4 collisions (Coral cache links provided to prevent slashdotting). MD4 collisions can be found in a few seconds (but nobody uses that any more), while MD5 collisions (still being used!) take 45 minutes on a 1.6 GHz P4. At last we will be able to implement various attacks which have been purely hypothetical until now. This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."
This discussion has been archived. No new comments can be posted.

MD5 Collision Source Code Released

Comments Filter:
  • by jeblucas (560748) <jeblucas&gmail,com> on Tuesday November 15, 2005 @05:21PM (#14038175) Homepage Journal
    I'm essentially crypto ignorant. About all I've known to do was verify MD5 hashes on downloads. Now that this is by-and-large pointless, how to check the veracity of things like Linux ISO's [linuxiso.org], video drivers [nvidia.com], etc, ad inifintum?
  • by Tackhead (54550) on Tuesday November 15, 2005 @05:24PM (#14038203)
    > This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."

    In other words, right now it's time to...
    ...LICK OUT THE KAMS, NOTHERGUCKERS!

  • by RootsLINUX (854452) <rootslinux&gmail,com> on Tuesday November 15, 2005 @05:24PM (#14038211) Homepage
    I guess someone thinks he's a little too cool to comment his code properly. Yeah, like "/* B2 */" tells me anything useful you insensitive clog!
  • Not -that- bad (Score:3, Interesting)

    by Parity (12797) on Tuesday November 15, 2005 @05:34PM (#14038309)
    The only attacks that these md5 collisions allow are denial-of-service/destruction-of-data attacks, they don't generally allow the compromise of protected data or access to systems or suchlike. The collision blocks that are generated are effectively random data. It has yet to be shown how to -craft- a collision block.

    If we could craft a collision block that contained a specified string at a specified position, that would be another issue altogether.

    The ability to find collision blocks easily does suggest that crafted collision blocks might be possible, but for now, you have as good a chance of getting a viable exploit out of /dev/random as out of a collision block.

    This doesn't mean we shouldn't look to other options for the newest releases of high-security software, but it doesn't mean that the md5 algorithm should be purged from our systems altogether either. It's still extremely valuable at detecting accidental corruption, and useful-with-caveats at detecting malicious corruption (45 minutes to discover a block of data that matches the sum is not really useful in either speed or resulting data for any kind of man in the middle attack, for example, so using md5 to validate network packets is safer than using it to validate disk files).

    Of course, the black hats may know more than we do about md5 weaknesses, but 'may know' is just as true of any other algorithm.
  • by goombah99 (560566) on Tuesday November 15, 2005 @05:40PM (#14038358)
    Maybe someone could explain why collisions are a serious problem for MD5. Or at least in what instances they are. I can see that in some cases, such as password hashing this could be a problem. But for many other uses I don't see what harm a collision has. Maybe I misunderstand but as I understand it MD5s are normally used in a checksum manner to sign or provide a fingerprint of a document. If you have an original document and compute it's MD5 then it can match some certified MD5 check sum. If someone were to generate a fake document they coul dnot design it to match the MD5 fingerprint. They could create some bit of gibberish that did match it but not a document that was useful as a forgery.

    I guess if one were trying to deliberately pedal gibberish, like say if you were the RIAA trying to destabilize a torrent net then that would be all you care about. But for more general issues I don't get it.

    Or is it now possible to generate a collision that also contains some intentional content like a binary program or source code or bank statement. That woul dbe be bad.

    It seems like even in the case of gibberish generation that some simple hacks could extend the useful life of this. For example, if you were to MD5 a whole document and then MD5 the concatenation of every other byte in the document, it woul dbe pretty hard to find two collisions that had that property simulateously. Sure I wont doubt there are better ways to hash something than adding hacks to MD5. I'm just saying that it seems like a simple hack might well be good enough to extend its useful lifespan for passwords and file shareing.

    But I invite you to explain to me why I'm wrong.

  • by Anonymous Coward on Tuesday November 15, 2005 @05:53PM (#14038457)
    a very simple perfomance check i love to run on every computer i come across:

    put windows calculator in scientific mode (to keep things comparable, please use windose calculator, not some custom written C+ program....)

    type in 100,000

    hit the n! button

    ignore the warnings that it will take a long time, don't even bother clicking on "Continue", because the calculation is still going.

    and report how long it takes to complete a factorial of 100,000

    please report what CPU you have.

    P4 3.2Ghz and Athlon 3200+ both do it in about 80 seconds....
  • by Medievalist (16032) on Tuesday November 15, 2005 @06:19PM (#14038723)
    MD4 collisions can be found in a few seconds (but nobody uses that any more)
    I thought MD4-encrypted passwords were still flying around most of the world's MS-windows networks. I coulda sworn I had to explicitly turn them off on all my nets...
  • by ChaosDiscord (4913) * on Tuesday November 15, 2005 @06:25PM (#14038782) Homepage Journal
    Maybe I misunderstand but as I understand it MD5s are normally used in a checksum manner to sign or provide a fingerprint of a document. If you have an original document and compute it's MD5 then it can match some certified MD5 check sum. If someone were to generate a fake document they coul dnot design it to match the MD5 fingerprint. They could create some bit of gibberish that did match it but not a document that was useful as a forgery.

    Most document formats have lots of "dead space", parts you can pretty much modify at will without changing what the user actually sees. Comments in HTML or PostScript. Old junk data in Word documents. Executables can have just about anything you like added if you know your stuff. The MD5 attacks currently available only 128 "dead space" bytes to generate a collision. So far from being a gibberish document, one can generate almost any document you want. This page has a simple example with PostScript files. [cits.rub.de] Both files have the same MD5 hash, but one is a relatively harmless letter of recommendation while the other is a grant of security clearance. Get your boss to sign your letter of recommendation digitally, swap in the security clearance file, and pass it on. This is a Big Deal and a Major Problem.

  • by ^BR (37824) on Tuesday November 15, 2005 @06:51PM (#14039048)

    Or you may have a brain tumor...

    The generaly accepted definition for "cracking a password" is, given the encrypted password, being able to generate a password the once entered in the authentication system will generate the same encrypted form.

    For the second time, this attack does not permit that. This attack permits build two byte streams that hashes to the same value. So in a password context, assuming the password system permits the entry permits more than 1024 bits of arbitrary data to be entered as password (I don't remind the details well, but I think this is the amount of data that you must be able to change between the two streams) one could generate to "passwords" (being that long they don't qualify for this name anymore IMHO) that would hash to the same value.

    How does that amount to an attack on MD5 passwords again? Never at any point the attack had been being able given a unknown to him MD5 hash to produce a stream that would hash to the same value.

  • Re:Q and A (Score:5, Interesting)

    by CodeRx (31888) on Tuesday November 15, 2005 @06:53PM (#14039082)
    sha1(md5($password . '¥1i9k') . 'a-thirty-five-ch4racter-l0ng-str1ng' . md5($password))

    This is a very bad password salting scheme and vulnerable to a dictionary attack. Once I have your database and salts, I can run a dictionary of common passwords through your scheme and crack any weak passwords.

    You can make things much harder by having your salt change for each password - include the username for example. Now I have to run my entire dictionary through the sha/md5 function for each user. By doing this, you make the attack O(m*n) instead of O(m) (where m = the number of words in my dictionary and n = the number of users).

    And as you mentioned in a follow up post, this code only generates documents with identical md5 sums, it does not generate a document with a given sum. So MD5 is broken for document signing and the like, but secure for password hashing for the time being.

  • Re:Why? (Score:3, Interesting)

    by einhverfr (238914) <chris.travers@gmail. c o m> on Tuesday November 15, 2005 @07:01PM (#14039162) Homepage Journal
    As an example, when DES was first known to be broken, the most intuitive solution would be to double-encrypt the plaintext. However, upon cryptographic analysis, this acutally fails to improve the complexity of an attack (and in some cases may simplify it). Thus, Triple DES.

    Your point is well taken. However, in this case, I believe you are mistaken. Indeed 3DES is just an extended form of DES which uses 2 keys and three encryptions. But here you are speaking of access so the analogy is likely to be fatally flawed anyway.

    My initial response was to the idea that a hash of a hash would provide any security and indeed would reduce it as your example illustrates. If you MD5 a SHA1 hash, you have reduced your security because a failure in either hash algorythm renders the entire chain ineffective (weakest link principle).

    Now, with my proposal, one would include independant hashs which would be checked independantly. If either one fails, one assumes that the data has been tampered with. The issue is that it would be difficult to defeat both simultaneously for this specific type of check. Being able to do so on demand while editing the file in a meaningful way might well prove impossible.

    Mixing encryption systems is indeed the bread and butter of the way our systems usually work. For example almost any key exchange algorythm uses a combination of encryption algorythms to ensure that symmetric encryption keys can be exchanged in a secure manner among previously unknown parties. But the key is that you want to make it a requirement to break every link on the chain and not have the system fail when one link is compromised. And yes, double-encrypting with the same key is stupid :-)
  • by chongo (113839) * on Tuesday November 15, 2005 @07:04PM (#14039191) Homepage Journal
    SHA1 is not a good alternative in some cases. For details on the cryptographic hash problem, see my paper:
    SHA1 Cryptographic Hash Update [systemexperts.com]
    My paper talks about the general problem at a high level. It gives a summary of common opinions expressed at the NIST Cryptographic Hash conference. Moreover, it gives developers some specific cryptographic hash recommendations.

    For the impatient, here is a summary for my recommendations for 2005-2006:

    • Avoid non-standard cryptographic hashes
    • Stop using MD5 now except for:
      • MD5 HMAC and MD5 hashed Passwords
      • Replace MD5 HMAC and MD5 hashed Passwords with SHA256 or SHA1 before end of 2007
    • Existing applications that use SHA1, where possible, should changed to use SHA256 before the end of 2008
      • For interoperability with older applications and hardware, these applications may have to also support SHA1
      • If you must support both SHA256 and SHA1, take care so that a "man in the middle" cannot inappropriately downgrade
    • Until a new Advanced Hash Standard (AHS) is adopted, new applications and hardware should be designed to use SHA256
      • For interoperability with older applications and hardware, these applications may have to also support SHA1
      • If you must support both SHA256 and SHA1, take care so that a "man in the middle" cannot inappropriately downgrade
    • All new applications and protocols must be designed to be algorithm agile
    • Existing applications and protocols should be modified to be algorithm agile by the end of 2008, if not sooner
    • SHA384 or SHA512 may be used in place of SHA256 in the above examples
      • Keep in mind that SHA384 and SHA512 are slower and larger than SHA256 or SHA1
    • Because it is possible that SHA1 will become unacceptably weak before 2008, and because SHA256 may become vulnerable to attack before Advanced Hash Standard (AHS) is adopted, a defense in depth approach must be taken

    See the paper for mode details.

  • by swillden (191260) <shawn-ds@willden.org> on Tuesday November 15, 2005 @07:40PM (#14039519) Homepage Journal

    now if you you were a software company you could put torrents out (I assume they use blocks of MD5sum), and then after the torrent becomes popular start randomly seeding people with blocks that hash correctly but are complete garbage (since you can't pick what exactly you hash)

    Yeah, they could do that. They could even use the same trick as the guys who showed the Postscript exploit and make the software do different things based on which of the collision pair is in the file. Get the right version and the software runs fine. Get the wrong version and the software formats your drive.

    In case it's not clear, the attack is to write some code like:

    byte a[] = { /* One collision plaintext block here */ };
    byte b[] = { /* Other collision plaintext block here */ };

    // ...

    if (memcmp(a, b, sizeof(a)) == 0)
    {
    // Do something useful
    }
    else
    {
    // Do something malicious
    }

    The legit version contains the same value for a and b, but the bogus version contains different values. Both programs hash to the same value, but they behave differently.

    This sort of attack can work wherever you can compare blocks and then follow different codepaths based on the result. It could probably work with the scripting language that DVD menuing systems are implemented in, too, so it could work for DVD ISOs. Audio and video streams would probably not be vulnerable.

  • Re:SHA-1??? (Score:3, Interesting)

    by jafac (1449) on Tuesday November 15, 2005 @08:55PM (#14040050) Homepage
    For my application, SHA-1 incurred a HUGE performance penalty. (factor of 1000). Given that there are few other variables I am free to change in this system, which hash, among these others you mention, tends to be more lightweight?
  • by swillden (191260) <shawn-ds@willden.org> on Tuesday November 15, 2005 @09:52PM (#14040360) Homepage Journal

    And well, here's a bit of news: someone has done just that with X.509 certificates based on MD5.

    Ouch. I'm not sure how I missed that one. The first page of their paper shows that I greatly underestimated the severity of the attack. What I hadn't realized is:

    Due to the ability of Wang's method to produce MD5 compression function collisions for any IV, and due to the iterative structure of MD5, we can append a collision to any block of data of our choice (provided that the bitlength is a multiple of the MD5 block length), while maintaining the collision property.

    My understanding was that the collision generation algorithm only created relatively small blocks, both effectively random, and that appending those blocks to different chunks of data would produce different hashes, even though the blocks themselves hash to the same value. But apparently the method can produce collisions even with arbitrary, attacker-specified IVs. That means effectively that you *can* craft a collision to any file, even a pre-existing file, as long as the file format has some "dead" space at an appropriate offset and the changes you want to make can precede this location.

    This is significantly worse than I understood it to be. It still doesn't affect password hashing, but it affects many other uses of hashing. The exploit that showed a pair of different postscript documents with different visible content but the same hash didn't require the property that Lenstra et al used to create the certs, which led me to my erroneous assumption.

    Thanks for pointing out this attack.

  • by kabloom (755503) on Wednesday November 16, 2005 @12:25AM (#14041068) Homepage
    Is it advised to switch GnuPG from SHA-1 for signatures to SHA-256 (like immediately)? How can I make this configuration change?
  • Re:SHA-1??? (Score:3, Interesting)

    by jd (1658) <imipak AT yahoo DOT com> on Wednesday November 16, 2005 @01:47AM (#14041398) Homepage Journal
    It depends a little on the implementation, but using mhash I get the following results when generating hashes of a gigabit file from DVD: (Results calculated by using gnu time, and using the user time, not the real time, result)


    • SHA-1 - 16 seconds
    • SHA-512 - 1 min, 17 seconds
    • Haval - 27 seconds
    • Whirlpool - 1 min, 14 seconds
    • Tiger - 18 seconds


    And then, when generating hashes of a 5 gigabyte file on my hard drive (there's been a lot of good stuff on Freshmeat, recently):


    • SHA-1 - 2 seconds
    • SHA-512 - 3 seconds
    • Haval - 5 seconds
    • Whirlpool - 5 seconds
    • Tiger - 3 seconds


    I'm going to guess, on the basis of these results, that you want to try Tiger as a first choice, Haval as a second choice, and that the mhash implementation will probably be OK for Tiger but if you opt for Haval you're probably better with a different hashing engine.

E = MC ** 2 +- 3db

Working...