Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Security Encryption

Meaningful MD5 Collisions 312

mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."
This discussion has been archived. No new comments can be posted.

Meaningful MD5 Collisions

Comments Filter:
  • Common sense (Score:3, Insightful)

    by gtrubetskoy ( 734033 ) * on Friday June 10, 2005 @03:40PM (#12783106)

    Unless I'm missing something, all these guys are doing is using a format that can contain an infinite amount of extraneous information that has no effect on how it's ultimately rendered, so same thing can be done with a .doc, or an HTML file, and this isn't really a cryptographic discovery of hash function weakness of any kind, just common sense for most of us - the secure hash algorithms should be applied to the English (or whatever language) textual contents of the document, no the source code of it, such as PostScript used in the article, PDF, HTML or whatever. I guess the most important lesson here is that this technique can be applied to binaries pretty easily as well.
    • no help (Score:2, Insightful)

      Anything you hash together will ALWAYS result in collisions.

      Extracting the formatting and code from the document will just make it EASIER to create a duplication.

      "Hello World!" might match with "Hello World!!!!!! this is extra stuff"

      at least leaving the exact formatting instructions in gives you a chance that the collision which leaves a compatible file is much more difficult than the hash of the simple raw text.
      • While it is true that any hashing has collisions, the law of big numbers applies here.

        Up to recently nobody has found a collision in 128 bit hashes like MD5. 128 allows for roughly 3x10^38 different hashes. SHA-256 allows for roughly 10^77 different hashes, which is close to number of atoms in the observable universe (estimated to be between 4×10^78 and 6×10^79).
        So, unless you present the algorithm with a lot of different input you'll never find a collision.

        It is probably safe to say that nobody
    • Re:Common sense (Score:2, Informative)

      by ergo98 ( 9391 )
      and this isn't really a cryptographic discovery of hash function weakness of any kind

      It's an example of the hash collision weaknesses recently documented, giving a practical example of how it could be used for malicious purposes.

      Traditionally we haven't had to worry about nonsensical things like applying the hash only to easily verifiable English text, because the hash is supposed to practically protect against intentional searches for collisions.
      • If you're really worried, hash twice with differing algorithms. Colliding MD5 with sensible data may be possible, colliding SHA-1 with sensible data may be possible, finding sensible data which collides with both is nay impossible.
    • They said that they have a method to make it easier to find collissions.

      While you are right that you always will have collissions they say that they can find it in a reasonable time.

      What is interesting though is that in their example they have control of both documents and they may use that to find a collision.

      This means that they may have a harder time to find a collision for an arbitrary document. If that is the case i don't know. I'm just speculating.
    • Your common sense seems a little ridiculous. Are we saying that all documents have to be reduced to text before applying our encryption? What about nontextual documents, like, say, process flowcharts, spreadsheets, powerpoint slides, multimedia?

      There are a lot of formats out there that allow additions of random undisplayed information, and so I presume that many of these formats are vulnerable to these attacks.

      I wonder how long it will take before there are exploits out there that take advantage of

    • On the contrary... (Score:5, Insightful)

      by Chris Pimlott ( 16212 ) on Friday June 10, 2005 @04:14PM (#12783520)
      This attack shows us all once again that there is that the procedures for using cryptography are as important as the mathematical theories and proofs on which cryptography is based. People like to believe that it's just the algorithm that's important, and once you have such an algorithm it's equally applicable to messages of all sorts and formats. As this shows, it's clearly not the case.

      You may believe it's common sense, but to the average user, encrypting a simple letter like the memos used in the article expressed as a Word document is no different than encrypting a simple text email. Heck, many of these users probably have no idea that much of the plain-looking email they send and recieve is actually HTML, which is capable of hiding beneath its rendered surface all sorts of additional information.

      When's the last time you saw an email program that read in a Word document, extracted just the plain text content, signed or encrypted it and then repackaged it into some new format in a cryptographically sound way that would automatically be reconstituted as a Word document on the other side? Most just have a handy "Sign" or "Encrypt" button that will happy accept .ps or .doc just as readily as a simple text file.
    • Re:Common sense (Score:4, Interesting)

      by iabervon ( 1971 ) on Friday June 10, 2005 @04:30PM (#12783718) Homepage Journal
      Actually, the two documents are actually almost identical. The difference is only one block in the whole file, which essentially acts as a selector for which of the two sets of content is displayed. MD5 (like most hash functions) works on fixed-size blocks smaller than the average file. To hash a complete file, you hash the first block, feed that into the hash with the second block, feed that into the hash with the third block, and so forth. So they have two files, and the first blocks are the same, the second blocks are different but hash the same, and the rest of the files are the same. Of course, the second blocks are junk, but the postscript is expecting a block-sized arbitrary value at that point anyway, so it doesn't matter that there's junk there.

      So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.

      The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).

  • Anyone want to explain the real world applications of this to someone who is considering turning in his nerd credentials after being unable to get the gist of this from the write up...and please don't tell me to RTFA, this is after all /.!
    • Basically, they used a high-powered particle accelerator to create MD5 collisions between human-meaningful documents, thus forging the missing link between thermodynamic and informational entropy.

      I hope that clears things up.
    • by jjares ( 141954 ) on Friday June 10, 2005 @03:56PM (#12783296) Homepage
      Basically, when you do an md5 for a string, you transform an existing text with a variable length to a fixed length string. Now, imagine the variable text is 200bytes long, but the fixed string is 20 bytes long, you are obiously loosing information, and that there may be a combination of 200 bytes that produce the same 20 byte sequence, but the amount of combinations in 20 bytes (160 bits) make it highly unlikely that you will find a repeated sequence. What this investingators found is a way to replicate this sequences. The problem being that usually we check integrity with this md5 hashes, so teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one. This is specially nice for putting backdoors in source code downloaded from the net, as we often check it against an md5 hash.
      • >teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one

        Not exactly. Not unless the attacker could choose the first text. The new attacks allow you to create two documents that collide, but don't (yet) allow you to take an arbitrary document and make another that collides with it.

        That word "yet" is important. A lot of bright crypto people will now start working on "preimage" attacks and they've got some new tools to work with. Be afraid
      • It bears mentioning that md5 doesn't account for the length of the file. So if someone were to try installing a backdoor into a program, and had a sophisticated enough piece of software using this method, comments, metadata, or other information could be used to 'pad out' the file to make it seem like the original -- even with source code files. Especially in the case of executables, they could just insert random crap at the end of an executable file, and make the md5 hash (and possibly the size) come out

        • It is actually not entirely true, that md5 (and sha1) does not account for the size of the file. Both md5 and sha1 use the same Merkle-Damgård structure, where the same function is applied to a running "total", initially an initialization vector, and a fixed-size block of the input. For both hashes, the blocks are 512 bits long, and the last block is padded and ends with the length of the file, as a 64-bit integer. So unless one of your files is extremely long, size is taken into account. Doesn't mean
    • If you don't know what a hash function is, then turn in your nerd credentials now.

      Basically, they provided an example case where one of these recent methods to generate hash function collisions can be turned into a "real world" attack.

      It's a very simple example case, but it demonstrates the point effectively. The point is that these recently discovered methods to generate collisions quickly are a real threat to any software using them as a method for digital signatures and such.

      The real world application
      • Download something critical, say, the current Linux kernel from kernel.org.
      • Insert a trojan/backdoor/whatever.
      • Manipulate the tar archive so the hashes match. This is the subject of TFA.
      • Somehow upload the trojaned kernel back to kernel.org.
      • Since the hashes for the original kernel and the trojaned kernel are identical, they both appear valid when checked against the signature.
      • ????
      • Profit!

      Before someone starts bitching about "lack of trust" in open source, please replace kernel with security update an

  • by Ckwop ( 707653 ) * <Simon.Johnson@gmail.com> on Friday June 10, 2005 @03:42PM (#12783126) Homepage
    As an amateur cryptographer, I must say that labeling these attacks as 'practically irrelevant'
    is at the very least misguided and at worst a shocking display of incompetence.

    Stop the fixation with plain-text messages, most messages are not plain-text. Your average word document
    contains loads of invisible data that doesn't get rendered. Pdf's contain "junk" data that doesn't get rendered either. Would
    you notice a single bit difference in an MP3? Or a single pixel colour change in a jpeg? Hell, you can even do it in HTML <div style="visibility:hidden">Junk goes here</div>.
    Mark my words, people will find in the next couple of months find two meaningful computer
    documents that hash to the same value but are different byte-wise.

    People undervalue these attacks because the attacker has to generate the collision before hand to use it.
    To properly appreciate the power of these attacks consider the following senario.

    Imagine we're agreeing a contract of employement and I'm your employer.
    I give you the first word document that includes all the standard terms, however, I've also drafted
    a Word document that contains a load of draconian clauses like banning you from working in any IT position five years
    after leaving the company. By adding junk that doesn't render to both documents, I've managed to find to make the hash
    of the two documents collide. Thinking I'm a nice employer, you sign the first document, which you do by signing the hash of
    document. However, I now have your signature on BOTH documents. I now make sure the company IT system "forget" the first document
    and I've successfully screwed you.

    This is a human example, but there are other examples that apply in computer systems. The problem is that in many situations
    the attacker can choose when you encrypt. Say you encrypt your e-mail conversation with your friend using S/MIME, many people click
    "Reply" and the message body of the other persons method appears in the new message. Because of these attacks,
    It's now no certainty that an attacker couldn't use this fact to construct collisions that an attacker could use.

    As another security researcher said (paraphrased) It's like you're in building and you've just heard the fire alarm go off.
    You can't see smoke but it's time to make your way calmly to the exit. That sums up the position with SHA-1 and MD5. Swap out the primitives
    before you start seeing smoke.

    It's not like we don't have alternatives anyway. Whirlpool uses the same wide-trail design principles has AES. It's slower than MD-5 or SHA-1 but it's much better designed. And beside, people would do well to realise you have to spend CPU cycles to get security.

    • However, I now have your signature on BOTH documents. I now make sure the company IT system "forget" the first document and I've successfully screwed you.

      Er, only if you're stupid enough not to keep a copy of a document that you sign.

    • Your average word document contains loads of invisible data that doesn't get rendered. Pdf's contain "junk" data that doesn't get rendered either. Would you notice a single bit difference in an MP3? Or a single pixel colour change in a jpeg? Hell, you can even do it in HTML.

      I think that's insufficient to be able to mount the attack.

      To do what these guys did, you need a format that can:

      • Contain "junk" data that is never displayed.
      • Contain two different "messages" only one of which will be displayed
      • Well, you can have two MP3s that sound the same even though they're different and you can have two jpegs that look the same even though they're different.

        There's plenty of scope for changing the files, we only need to select roughly 128-bits in each file to message about with to drive a collision.

        I agree with your suggestion to use Javascript over HTML to "disguise" the change.

        • Well, you can have two MP3s that sound the same even though they're different and you can have two jpegs that look the same even though they're different.

          But you can't practically generate two MP3s or JPEGs that have *meaningful* differences, yet hash to the same value.

      • MP3s and JPGs do allow tags to contain extra data that will not effect how the music sounds or what the image looks like at all.

        That's pretty irrelevent though as the whole point of this hack is that you change the sound or the image anyway, then add the extra data to make it hash correctly to a specified hash
      • To do what these guys did, you need a format that can:
        • Contain "junk" data that is never displayed.
        • Contain two different "messages" only one of which will be displayed.
        • Be able to select which message to display based on the result of a calculation involving the junk.

        Microsoft Office and OpenOffice.org documents both can contain executable content which can execute when the document is opened, and which can alter the contents of the document.

        I am not very familiar with Microsoft Office, but in Op

    • If things went to court, it should be fairly easy to obtain a copy of the forged electronic document and look for added junk.

      A novice might miss it, but a trained eye could easily see the garbage.

      What worries me more is executeables where you're depending on the hash to match. If the file sizes are approximately the same, it would be very easy to trick someone into running something that is actually something else.
      • If things went to court, it should be fairly easy to obtain a copy of the forged electronic document and look for added junk.

        Lets imagine a more devious employer. What if the junk was on the legitimate document? The employer has you sign the one with junk, and then when it goes to court, the employer claims that YOU generated a hash collision to write yourself an easier/better contract.

        It works both ways. The junk could be on either document, and both parties have motives to forge a document.
        • The junk could be on either document, and both parties have motives to forge a document.

          Since this is an attack in which you create two documents, the party which created the document must be the would-be forger. I think your evil employer would have a hard time convincing the court that I wrote the document they had me sign.

    • I now make sure the company IT system "forget" the first document and I've successfully screwed you.

      Not quite, beacuse of reasonable doubt and the fact that the hypothetical employee would have copies of the document.

      However, Alice can get Bob to sign an innocuous recommendation letter that in the hidden version is a power of attorney for Bob's bank accounts. Alice can then take the fraudulent letter to Bob's bank and with apparent legality, take all of his money. (The difference being that a third part
    • As an amateur cryptographer

      I'm curious, do you get to take classes in that or is it graduate work? I'm currently in my undergrad at a school that doesn't have any cryptology courses. When I graduate in December I'll be heading off to the Navy to (hopefully) do crypto, but I wonder where civilian cryptology programs are.
    • people would do well to realise you have to spend CPU cycles to get security.

      This should be obvious to anyone. Any reasonably fast and efficient hash or encryption algorithm can be brute forced given enough time on a sufficiently large parallel machine. This means 1) Any truly effective hash or encryption must be slow, even on the latest hardware, and 2) any algorithm is reasonably secure today won't be secure against a dedicated attack by somebody with sufficient resources 10 years from now. Given the con

      • An exception to the above has occurred to me. Encryption with a one-time pad is secure for as long as the pad is secure, so it can be both fast and unbreakable. A secure scheme for pad distribution is left as an excercise for the user, and this applies only to encryption, not message digests (hashes). It should be possible to eventually brute force a collision to any message digest.
    • I think that the most important type of attack involves binaries. For example I could surpass tripwire and enable an exploit that was never visible in my binary code. Almost none of you would be able to tell the difference (a couple of bits), not even tripwire (md5), between the exploitable and non-exploitable version of the program. But the change in behavior could be different so that the exploit is possible.

      1. Write a very useful tool
      2. Compile it
      3. Tamper the file to get two almost identical vers
  • See Schneier's Blog [schneier.com] for more thoughts on the subject. I am sure it will get fleshed out more as more details emerge.
  • If MD5 is found to be insecure, what are the alternatives we can use when signing our open-source packages? Is there any other alternative that is even approaching the widespread use of md5sum?
    • by specialbrad ( 884393 ) on Friday June 10, 2005 @03:52PM (#12783243)
      The signing of open-source packages are to prevent download corruption usually. If a download is corrupted, the data will be different, and hence the hash will be different. Most of these attacks are malicious in that you have to go great lengths to find a collision to use. If your connection corrupts the download in such a way to produce a collision, your modem obviously hates you.
    • The obvious solution is to encrypt the file/document/OSSpackage with a private RSA key. The only way to view it then is to use the public key - which is published just like a MD5 sum.

      There are 2 reasons people do this. 1) decryption takes longer than computing a checksum. and 2) this would force everyone to decrypt the file in order to use it. Only paranoid people and those who realize bits get corrupted bother to verify checksums now.

      Reliable solutions are entirely available, people just need to start

    • I may be wrong but I think that for that purpose, the use of MD5 is still quite secure. What those researchers did was make 2 files with the same MD5, they didn't choose the md5 value itself. In order to crack the schemes you're mentioning the md5 value is a given value for which you want to generate another file (many times with the additional restriction that the file sizes must match).

      Read about collision attacks versus preimage attacks here [cryptography.com].

      Unless you're assuming that at least one of the people respon
    • 'gpg -ba THEFILE'

      This generates a detatched signature named "THEFILE.asc".

      The .asc file can used to verifiy the authenticity of THEFILE by the end user with the command:

      'gpg --verify THEFILE.asc'

      This method is already in pretty widespread use for open source software distribution. For example, Slackware has used for all official packages since 8.1.
    • RIPEMD160 is the only hash from that "generation" still standing, so it's probably the simplest one to use. There are newer hashes that probably still work though, SHA256, along with the new Tiger and Whirlpool hashes.
  • Text of Letters (Score:3, Informative)

    by pete-classic ( 75983 ) <hutnick@gmail.com> on Friday June 10, 2005 @03:46PM (#12783172) Homepage Journal
    For those who can't convientently view PostScript files, the text of the two letters:

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    Alice Falbala fulfilled all the requirements of the Roman Empire
    intern position. She was excellent at translating roman into her gaul
    native language, learned very rapidly, and worked with considerable
    independence and confidence.
    Her basic work habits such as punctuality, interpersonal deportment,
    communication skills, and completing assigned and self-determined
    goals were all excellent.
    I recommend Alice for challenging positions in which creativity,
    reliability, and language skills are required.
    I highly recommend hiring her. If you'd like to discuss her attributes
    in more detail, please don't hesitate to contact me.
    Julius Caesar

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    May, 22, 2005
    Alice Falbala is given full access to all confidential and secret
    information about GAUL.
    Julius Caesar
  • by swillden ( 191260 ) * <shawn-ds@willden.org> on Friday June 10, 2005 @03:46PM (#12783179) Homepage Journal

    What these researchers did was not to improve the known attacks on MD5, but to demonstrate a clever way of turning the known attack, generally considered to be of theoretical interest only, into an attack that could potentially really be used.

    The way they did it was to create a postscript document that actually contains two documents, one that the sender would be willing to sign and one that he presumably would not. The full text of both is contained in the file, but near the beginning of the file is a bit of code that compares two blocks of random-appearing bits, call them A and B. If A == B, the postscript interpreter will select the innocuous message and display that. If A != B, the interpreter will display the other message.

    The researchers then generated a pair of blocks with the same MD5 hash. In one copy of the postscript file, they used one of these blocks as both A and B. In the other copy, they used one block as A and the other as B. Because every bit of both documents before and after the two blocks is identical, and because those blocks hash to the same value, the documents hash to the same value.

    It's an interesting attack. It only applies to documents that are also programs, in some sense, but we use lots of document formats that fit that description.

    A simple countermeasure that makes such an attack more difficult is to compress the documents before signing.

    • Yes, that shows a drawback of the method - evidence of the attack is present on both files, you just have to hex edit them and look for their content...

      Nevertheless, it's a good way to shut the mouths of the ones who say that hash function attacks are still theoretical...
    • It is an interesting attack, and IANAL, but I'd be curious about the legal ramifications. If I slip a carbon (ah... the way-back machine) in a stack of papers and ask someone to sign the top one without thus informing them, I think my stealth probably invalidates the additional document(s).

      You could argue that there's a noticeable difference between pen and carbon -- making the copy hard to enforce -- but I'd argue the digital version is even easier: at least in the PS example, both "copies" of the do

    • Clever, but it means the attack is not a general way to forge an MD5-signed document... you couldn't use this (for example) to seed a P2P network with malicious files that look like safe ones. It only works if you generate both documents, and it can only be used maliciously if it's never examined by an expert: the signer can't retain a copy of the signed document or obtain a copy through discovery.
  • Maybe now, people will finally start migrating to a different hash algorithm for signatures. MD5 has been around long enough that people have known this sort of thing is possible for some time. It's only now that they finally sat down and did the proof-of-concept.

    Of course, MD5 passwords are probably still safe, for now (between maximum password lengths and the fact that this attack will have a hard time actually doing that), but it's only a matter of time.
  • by StreetFire.net ( 850652 ) on Friday June 10, 2005 @03:52PM (#12783248) Homepage
    Regarding being "practically irrelevant"

    "every time [some software engineer] says, 'nobody will go to the trouble of doing that,' there's some kid in Finland who will go to the trouble."
    Taken from Kevin' Mitnik's "The Art of intrusion"
    http://www.amazon.com/exec/obidos/tg/sim-explorer/ explore-items/-/0764569597/0/101/1/none/purchase/r ef%3Dpd_sxp_r0/104-8074733-7395136 [amazon.com]
  • Possible solution (Score:2, Insightful)

    by orb_fan ( 677056 )

    Is the answer then to create a hash that is in fact the sum of two distinct hashing alogrithms? For example, use MD5 and SHA1. Since the alogrithms use different methods, the set of collisions from one would not overlap the set from the second (or rather the overlap would be vanishingly small). And if the overlap was larger than you wanted - just keep adding different hash alogrithms until you are satisfied.

    I realise that the computations involved aren't cheap, but it becomes a trade-off between security

    • The combination of two weak hash functions is just another weak hash function. You might be able to buy yourself a small amount of time, but it will be broken. If there is a large set of operations that can modify the file without changing the first hash, and another large set of operations which can modify it without changing the second hash, you can construct the intersection of these two sets to find the modifications which keep both hashes (and therefore also their sum) the same.

      Strong hash functions a
    • by gotr00t ( 563828 ) on Friday June 10, 2005 @05:14PM (#12784252) Journal
      Think of it this way: You can hash _any_ file of _any_ size using either MD5 or SHA1, and these algorithms will hive you an alphanumeric hash, which has a limited number of permutations. Thare are an infinate number of unique files (assuming adequate storage space), yet there are finite unique hashes.

      Therefore, no matter how many algorithms you sum up using your described method, the number of collisions is still infinite in amount. It is not the algorithms that are flawed, rather, it is the fundamential concept of hashing that allows collisions to happen.

      I would assume that the way to reduce the number of collisions is by increasing the length of the hash itself so as to increase the number of unique hashes.

  • 7 bits difference (Score:3, Informative)

    by lavalyn ( 649886 ) on Friday June 10, 2005 @03:55PM (#12783282) Homepage Journal
    Compare the two files, and you see that there were only 7 bits that changed, all on the 5th most significant bit of the affected byte.

    I guess it's obvious from looking at the ps file in text form, but if it's that easy to mangle postscript to display two different layers (or is it changing comments or pointers? I am not a binary postscript parser.) I still don't think it's time to throw md5 away yet. Six months.

    What other hashing alternatives are there these days? SHA-1 apparently has the same kinds of weaknesses.
    • SHA-1 apparently has the same kinds of weaknesses.

      I don't that is exactly true.

      From http://www.schneier.com/blog/archives/2005/02/sha 1 _broken.html [schneier.com]:

      "collisions in the the full SHA-1 in 2**69 hash operations, much less than the brute-force attack of 2**80 operations based on the hash length."

      2**69 is still a very large number, so you probably shouldn't be worried just yet. It isn't quite the same as the bit trick for MD5. If anyone knows of a bigger weakness in SHA-1, I would be interested to know.
  • by vonstauf ( 827404 ) on Friday June 10, 2005 @03:56PM (#12783292) Homepage
    We could just couple it with another widely used industry standard [rot13.com]
  • Okay, I'm impressed. (Score:5, Interesting)

    by ave19 ( 149657 ) on Friday June 10, 2005 @04:01PM (#12783348)
    At first I thought: Postscript! Well, obviously. To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide. If you tried to hide random data in a text file, it would be obvious to the person signing it. You need some format to hide the random bits from the viwer.

    I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

    Then I downloaded the files...

    There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.

    Collisions must be very easy to find! I am now offically very worried about this.

    • I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

      You only need as much random data as fits in the size of the hash signature. I.e. for MD5, 128 bits is enough for at least 1 collision.

    • by yeremein ( 678037 ) on Friday June 10, 2005 @04:55PM (#12784019)
      To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide.

      That's actually not what they did. They generated two essentially random blobs of data that have the same hash. We'll call these X and Y. They then created a PostScript document containing BOTH messages, the one that Alice's boss would sign and the one he presumably would not sign. They inserted two copies of block X into one of these documents, and a one X and one Y into the other. The original document contained code that compared the two blocks, and if they were the same, caused one message to be rendered, or if they were different, caused the other message to be rendered. Thus both documents hash the same (since X hashes the same as Y), but you see different text when you view the files.

      This sort of attack would only work on documents that can contain code of some sort. It would not work on text files.

  • by malcomvetter ( 851474 ) on Friday June 10, 2005 @04:01PM (#12783351)

    in my next big project?

    In all seriousness, I believe Schneier's right. We need a competition for a new hash function [schneier.com].

    Nah, let's just wait for 24 [techweb.com] to drop the words "MD5" before we know it's really bad.
    • You should provide some mechanism by which a user can select N hash methods to be used, with perhaps a warning against chosing MD5 or SHA1 alone. Then rather than users having to worry about this or that hash/encryption being broken, they can just switch, with warnings when they use the old algorithms (maintained for backwards compatibility).

      I suspect that in the near future, signing will be done using two different hashes to seriously restrict the collision space.
  • by G4from128k ( 686170 ) on Friday June 10, 2005 @04:02PM (#12783357)
    The core problem is that the resources of the defender are inevitably overwhelmed by the resources of the attacker.
    1. Relative expected values of gian vs. loss: The attacker thinks "I know I can gain a #BIG_NUM million dollars" and devotes their full effort to the attack. The defender thinks "I'm safe, there's a low probability, and I'm sure I'll catch the problem before it becomes real money, " and does not not devote effort to security becuase a Gartner report told him it was over-hyped. Thus, the attacker's perceived expected value is much higher than the defenders perceived expected loss and each invests accordingly.
    2. Rising Complexity: As IT systems become more complex, they become less secure. Each new device, new networking protocol, new physical layer, new OS feature, new networked application provides new opportunities for the attacker and a dilution of security resources for defenders.
    3. Time: The attacker has the advantage of time. New algorithms, new mathematical theories, new exploits, and faster processors all favor the attacker. What once was supposed to take the age of the universe to crack can be decrypted with a quickly declining number of networked (even zombied) PCs.
    4. Curse of Compatibility: Because so much crypto and security is networking related, it is subject to implementation delays caused by the need to be compatible. Defenders continue to use old, vulnerable systems to maintain compatibility with key partners. Patches don't solve the problem because the patch itself can introduce incompatibilities that make defenders leary of applying the patch with a very real chance of causing problems to avoid a hypothetical security issue.
    The bottom line is that the defender must protect all vulnerabilities while going about the day-to-day business of using the computer. In contrast, the attacker can devote full time to any weakness of their choice.
  • by erroneus ( 253617 ) on Friday June 10, 2005 @04:03PM (#12783390) Homepage
    I forget where or when exactly, so please feel free to run a search if you care to... it was here on Slashdot though.

    There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.

    Am I missing something or are these two stories relevant to each other?
  • by cpeikert ( 9457 ) <cpeikertNO@SPAMalum.mit.edu> on Friday June 10, 2005 @04:04PM (#12783402) Homepage
    Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

    Here's a link to the paper: Lenstra et al [iacr.org].
  • I hope all those who said the previous MD5 attacks were nothing to worry about will take it back. A theoretical break or partial break is almost always followed by a practical break. The lesson to learn from this is not to do the same thing with SHA1 and, I would suggest, AES. The vulnerabilities might be impractical now but they won't stay that way. Move now, while you're still at least partially secure, rather than once there's a practical break.
  • by rar ( 110454 ) on Friday June 10, 2005 @04:19PM (#12783576) Homepage
    What I haven't seen mentioned yet, and people perhaps haven't realized, is that in providing these two postscript files, they have essentially provided you with an postscript signature exploit kit!

    All you need to do is download the two postscript documents and do *exactly corresponding edits* in both of them, and you get two documents saying different things and still have the same md5sums!

    I just tried exchanging Alice's name for my own, and surely it did work.

    Now, if they released a pdf-file hack, I would be genuinely worried :)...
    • Not, however very useful. Because if you want to exploit this, you'd need messages which aren't about Gaul and Caesar!

    • All you need to do is download the two postscript documents and do *exactly corresponding edits* in both of them, and you get two documents saying different things and still have the same md5sums!

      No. The whole point of hashing that it is extremely difficult to figure out what *exactly corresponding edits* are, so the provided exploit documents cannot be edited. This is why these two postscript files are noteworthy: they are different, yet they hash to the same thing. Since the two postscripts are differe
      • Since the two postscripts are different, changing the same letter in both will NOT result in the same hash.

        Actually the textual part of both is the same, they just found some PS magic to choose between two 'sub-documents' in the same PS. Indeed one can trivially show that modifying both files will result in a collision since both files contain the text of both messages.

        OTOH, in the real world, one would be hard pressed to change the contents of an existing document.
  • last time we had this discussion , maybe 3-6 months ago.. and they were meaningless collissions.. i told everyone "give it another 6 months or so, and someone'll figure out how to spoof properly"
  • This has a solution (Score:2, Informative)

    by yfarjoun ( 878821 )
    This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".

    See http://en.wikipedia.org/wiki/Birthday_paradox [wikipedia.org] for details.

    Basically if you have enough documents you will be able to find two with the same hash. Not to be mistaken with the difficulty of finding a document with the same hash as a GIVEN document. This cannot be avoided regardless of the hash.

    To foil a birthday attack, NEVER agree to sign a given document. One must ALWAYS add some

Money can't buy love, but it improves your bargaining position. -- Christopher Marlowe