Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Security Encryption

Meaningful MD5 Collisions 312

mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."
This discussion has been archived. No new comments can be posted.

Meaningful MD5 Collisions

Comments Filter:
  • Text of Letters (Score:3, Informative)

    by pete-classic ( 75983 ) <hutnick@gmail.com> on Friday June 10, 2005 @03:46PM (#12783172) Homepage Journal
    For those who can't convientently view PostScript files, the text of the two letters:

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    Alice Falbala fulfilled all the requirements of the Roman Empire
    intern position. She was excellent at translating roman into her gaul
    native language, learned very rapidly, and worked with considerable
    independence and confidence.
    Her basic work habits such as punctuality, interpersonal deportment,
    communication skills, and completing assigned and self-determined
    goals were all excellent.
    I recommend Alice for challenging positions in which creativity,
    reliability, and language skills are required.
    I highly recommend hiring her. If you'd like to discuss her attributes
    in more detail, please don't hesitate to contact me.
    Sincerely,
    Julius Caesar

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    May, 22, 2005
    Order:
    Alice Falbala is given full access to all confidential and secret
    information about GAUL.
    Sincerely,
    Julius Caesar
  • Re:Common sense (Score:2, Informative)

    by ergo98 ( 9391 ) on Friday June 10, 2005 @03:46PM (#12783177) Homepage Journal
    and this isn't really a cryptographic discovery of hash function weakness of any kind

    It's an example of the hash collision weaknesses recently documented, giving a practical example of how it could be used for malicious purposes.

    Traditionally we haven't had to worry about nonsensical things like applying the hash only to easily verifiable English text, because the hash is supposed to practically protect against intentional searches for collisions.
  • by swillden ( 191260 ) * <shawn-ds@willden.org> on Friday June 10, 2005 @03:46PM (#12783179) Journal

    What these researchers did was not to improve the known attacks on MD5, but to demonstrate a clever way of turning the known attack, generally considered to be of theoretical interest only, into an attack that could potentially really be used.

    The way they did it was to create a postscript document that actually contains two documents, one that the sender would be willing to sign and one that he presumably would not. The full text of both is contained in the file, but near the beginning of the file is a bit of code that compares two blocks of random-appearing bits, call them A and B. If A == B, the postscript interpreter will select the innocuous message and display that. If A != B, the interpreter will display the other message.

    The researchers then generated a pair of blocks with the same MD5 hash. In one copy of the postscript file, they used one of these blocks as both A and B. In the other copy, they used one block as A and the other as B. Because every bit of both documents before and after the two blocks is identical, and because those blocks hash to the same value, the documents hash to the same value.

    It's an interesting attack. It only applies to documents that are also programs, in some sense, but we use lots of document formats that fit that description.

    A simple countermeasure that makes such an attack more difficult is to compress the documents before signing.

  • by William_Lee ( 834197 ) on Friday June 10, 2005 @03:46PM (#12783181)
    Thanks for clearing that up. Now it all makes perfect sense!
  • by specialbrad ( 884393 ) on Friday June 10, 2005 @03:52PM (#12783243)
    The signing of open-source packages are to prevent download corruption usually. If a download is corrupted, the data will be different, and hence the hash will be different. Most of these attacks are malicious in that you have to go great lengths to find a collision to use. If your connection corrupts the download in such a way to produce a collision, your modem obviously hates you.
  • 7 bits difference (Score:3, Informative)

    by lavalyn ( 649886 ) on Friday June 10, 2005 @03:55PM (#12783282) Homepage Journal
    Compare the two files, and you see that there were only 7 bits that changed, all on the 5th most significant bit of the affected byte.

    I guess it's obvious from looking at the ps file in text form, but if it's that easy to mangle postscript to display two different layers (or is it changing comments or pointers? I am not a binary postscript parser.) I still don't think it's time to throw md5 away yet. Six months.

    What other hashing alternatives are there these days? SHA-1 apparently has the same kinds of weaknesses.
  • by jjares ( 141954 ) on Friday June 10, 2005 @03:56PM (#12783296) Homepage
    Basically, when you do an md5 for a string, you transform an existing text with a variable length to a fixed length string. Now, imagine the variable text is 200bytes long, but the fixed string is 20 bytes long, you are obiously loosing information, and that there may be a combination of 200 bytes that produce the same 20 byte sequence, but the amount of combinations in 20 bytes (160 bits) make it highly unlikely that you will find a repeated sequence. What this investingators found is a way to replicate this sequences. The problem being that usually we check integrity with this md5 hashes, so teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one. This is specially nice for putting backdoors in source code downloaded from the net, as we often check it against an md5 hash.
  • by rbarreira ( 836272 ) on Friday June 10, 2005 @04:03PM (#12783372) Homepage
    I may be wrong but I think that for that purpose, the use of MD5 is still quite secure. What those researchers did was make 2 files with the same MD5, they didn't choose the md5 value itself. In order to crack the schemes you're mentioning the md5 value is a given value for which you want to generate another file (many times with the additional restriction that the file sizes must match).

    Read about collision attacks versus preimage attacks here [cryptography.com].

    Unless you're assuming that at least one of the people responsible for redistributing the software have bad intentions?
  • by Otto ( 17870 ) on Friday June 10, 2005 @04:03PM (#12783384) Homepage Journal
    If you don't know what a hash function is, then turn in your nerd credentials now.

    Basically, they provided an example case where one of these recent methods to generate hash function collisions can be turned into a "real world" attack.

    It's a very simple example case, but it demonstrates the point effectively. The point is that these recently discovered methods to generate collisions quickly are a real threat to any software using them as a method for digital signatures and such.

    The real world application here is that it is possible, probably in several good ways, to generate a couple of different files that have the same hash and also have meaningful data in them. The attacks found that generate seemingly random data with the same hashes can be used in ways that will let them apply to non-random, purposefully designed data.

    The example they use is where some secretary gets her boss to sign a document, and then uses his signature on another document which gives her access she shouldn't have. It's a way to forge a digital signature on a document by having them sign another one that you specially crafted.
  • by Dachannien ( 617929 ) on Friday June 10, 2005 @04:05PM (#12783403)
    We're talking about cryptographic hashes here, not encryption. Encryption is meant to be a reversible process, and is therefore one-to-one. In other words, there's no concern over collisions with encryption.

    With cryptographic hashes, you're throwing away nearly all of the data to obtain a hash (a number) which represents the larger data set in such a way that (hopefully) the hash will never turn up again in practical usage. The article here indicates that there are ways being devised to force two data sets to have a hash collision while keeping the practical parts of the data sets the same.

    As for accusing encryption of being "security through obscurity", you're misusing that term. If knowing the encryption algorithm allowed you instant access to all data encrypted with that algorithm, then yes, the only security present would be dependent upon the secrecy of the algorithm itself. But that's not the case here. Encryption typically works by public key exchange, meaning that a key (a number) used to encrypt messages is shared with the encrypting partner, while the key to decrypt and recover the data is kept private (is never transmitted). Recovering the private key through brute force is not a compromise of the algorithm itself - given enough time, any private key can be recovered, regardless of the algorithm, but by increasing the key size arbitrarily, the time taken to find that key can also be increased arbitrarily.

  • by MyTwoCentsWorth ( 593731 ) on Friday June 10, 2005 @04:09PM (#12783450)
    Dear me,

    Your ignorance is so appalling that I have a hard time deciding where to start.

    First - this is about digital signatures, not about encryption - sharing the documents (by sending them over electronic media or otherwise) is needed.

    Second - obfuscated algorithms ??? Open any book on encryption to learn that the strongest encryption algorhitms rely on completely documented algorhitms - security through obscurity is so 80's...

    My suggestion - try to have something worth saying before making suggestions.

    Happy Posting.
  • by pz ( 113803 ) on Friday June 10, 2005 @04:09PM (#12783458) Journal
    I now make sure the company IT system "forget" the first document and I've successfully screwed you.

    Not quite, beacuse of reasonable doubt and the fact that the hypothetical employee would have copies of the document.

    However, Alice can get Bob to sign an innocuous recommendation letter that in the hidden version is a power of attorney for Bob's bank accounts. Alice can then take the fraudulent letter to Bob's bank and with apparent legality, take all of his money. (The difference being that a third party is involved who is naive to the document Bob signed and his intents.) This is effectively the scenario suggested in TFA, and because there is no alternate version of the document, Bob has no recourse (unlike the parent poster's suggestion).

    To exploit this hole in MD5 (SHA-1, or whichever hashing function you like), you need to create two completely different documents, not two which are different versions of the same thing and can be compared by humans. That's the real hole: these documents need to be printed out (or rendered on a screen) and interpreted by humans who assume that the rendering process is trustworthy because it is complete and veracious. If humans natively read PostScript with the same ease we read English, for example, this kind of exploit wouldn't be possible.
  • by Anonymous Coward on Friday June 10, 2005 @04:10PM (#12783474)
    Nice referrer link.
  • by cpeikert ( 9457 ) <cpeikert AT alum DOT mit DOT edu> on Friday June 10, 2005 @04:22PM (#12783611) Homepage
    How exactly could they create junk documents that also match the expected filesize.

    The collisions that have been found for MD5 are for pairs of documents that are the same size. The size constraint is not a problem.
  • Re:big oops (Score:2, Informative)

    by tota ( 139982 ) on Friday June 10, 2005 @04:22PM (#12783623) Homepage
    sha-256 and later have not been found to be weak yet. It does not mean that they are unbreakable (or I should say: possible to derive collisions) but it is definitely better.

    Another solution that should work quite well is to combine hashes: (md5+sha1) is definitely much stronger as you would need to find a collision that works for both algorithms at the same time.
    Possible, but not likely.
  • by loose_cannon_gamer ( 857933 ) on Friday June 10, 2005 @04:23PM (#12783632)

    I think that's a big shortsighted... I agree that if we let history take a crack at it, that any encryption put together by smart people will eventually be breakable by smart people.

    However, most data that I deal with day-to-day is time relevant. Do I care if someone figures out my credit card number on an account I closed 5 years ago? Is it terrible if someone hacks an old email only to find out I was begging a professor for a passing grade in 1997?

    Encryption is meant to hide things, and for many things, the need to hide is temporary. If the hidden thing stays hidden as long as it needs to stay hidden, there is nothing wrong with it.

    Know the limitations on the technology you use, and know the parties with which you exchange information. Those two rules alone, if followed, will probably provide more than adequate real-world defense. Perfect? No. Good enough. Statistically, yes.

  • by deanoaz ( 843940 ) on Friday June 10, 2005 @04:23PM (#12783634)
    If two documents exist with the same hash, then they were both produced by the same source, since there is no practical, known way of finding a collision without having control of the content of both documents. Therefore, your signed copy of the original document proves that the employer created both versions.
  • by yeremein ( 678037 ) on Friday June 10, 2005 @04:55PM (#12784019)
    To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide.

    That's actually not what they did. They generated two essentially random blobs of data that have the same hash. We'll call these X and Y. They then created a PostScript document containing BOTH messages, the one that Alice's boss would sign and the one he presumably would not sign. They inserted two copies of block X into one of these documents, and a one X and one Y into the other. The original document contained code that compared the two blocks, and if they were the same, caused one message to be rendered, or if they were different, caused the other message to be rendered. Thus both documents hash the same (since X hashes the same as Y), but you see different text when you view the files.

    This sort of attack would only work on documents that can contain code of some sort. It would not work on text files.

  • by Beryllium Sphere(tm) ( 193358 ) on Friday June 10, 2005 @04:57PM (#12784054) Journal
    >Switching to larger keys will not change anything.

    Switching to longer hash sizes (not keys) will help against the current generation of SHA attacks. The effect of the Wang attacks is to knock (if memory serves) 16 bits off the work factor to find a collision. In other words, finding a collision in SHA-1 is now 64K times easier than brute force. Make brute force harder and the attack gets harder.

    Some numbers: SHA-1 produces a 160-bit hash. If you're trying to find a collision as oposed to generating one the birthday paradox works in your favor and you only need on average 2^80 trials for brute force. Wang has lowered that number to 2^64, which is uncomfortably small by today's standards and reckless by tomorrow's. SHA-256 produces a 256-bit hash: you do the math.

    The big worry is that cryptographic attacks tend to inspire research that finds better attacks. There's an unquantifiable but real risk that SHA-1 could collapse altogether.

    The crypto geeks are speulating that these weaknesses are inherent in any hash algorithm that can be calculated incrementally. Right now the output after block N of input depends only on blocks 1..N of input. If the intermediate result for block N depended on blocks N+1..EOF then the current attacks wouldn't work. Neither would hashing an input stream. Such a change would destroy the ability to hash a file in parallel with reading it.
  • Re:Possible solution (Score:2, Informative)

    by orb_fan ( 677056 ) on Friday June 10, 2005 @04:59PM (#12784087)

    You are assuming that I read hashing alogrithm threads, which I don't. If there is a reason why this wouldn't work, then please enlighten me.

  • by skelly33 ( 891182 ) on Friday June 10, 2005 @05:20PM (#12784321)
    "It is probably not possible to get a pair of files that will collide in all three of those hashes."

    Not impossible, just less probable. Combining algorithms effectively increases the size of your fingerprint. The probability of a duplicate is a function of original data size and its proportion to the fingerprint data size. The closer those two data sizes are, the lower the probability of duplicates.

    It may seem difficult to pull off now, but eventually someone will make a nice, clever script to do it and then the script-kiddies will have a field day.
  • This has a solution (Score:2, Informative)

    by yfarjoun ( 878821 ) on Friday June 10, 2005 @05:21PM (#12784324) Homepage
    This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".

    See http://en.wikipedia.org/wiki/Birthday_paradox [wikipedia.org] for details.

    Basically if you have enough documents you will be able to find two with the same hash. Not to be mistaken with the difficulty of finding a document with the same hash as a GIVEN document. This cannot be avoided regardless of the hash.

    To foil a birthday attack, NEVER agree to sign a given document. One must ALWAYS add some random junk to the document (clear text or hidden). This should be implemented in sofetware that helps in signing, but if the option isn't given, you can add a clear text "random" string.

    Yossi

  • Re:Common sense (Score:3, Informative)

    by Anonymous Coward on Friday June 10, 2005 @05:51PM (#12784663)
    Isn't this obivous if you check the file sizes?

    The files are the same size.

    The cksum comand (which uses a 32 bit CRC) spits out the checksum and a file size. Why doesn't md5sum do the same thing?

    It does - The file size is used as part of the MD5 hash. The MD5 algorithm hashes the file, then appends the file size and hashes that too. If it didn't do this then you could create an MD5 collision just by appending zeros to the file.
  • by ars ( 79600 ) <assd2@d s g m l .com> on Friday June 10, 2005 @06:50PM (#12785259) Homepage
    Every time this comes up people mention the same idea - use two hashes for security!

    What people don't realize is that MD5 _IS_ two hashes already! That's how it works!

    To make it worse the hashes you mentioned MD5 and SHA-1 are based on exactly the same algorithm, so using it twice doesn't help you much.
  • by rar ( 110454 ) on Friday June 10, 2005 @07:08PM (#12785423) Homepage
    It is the same document, just relying on differences in the document name (it appears) to generate the different pages.

    No, you have missed the point. Go back and rtfa again. The attack still works if you rename the documents to the same filename.

    The difference lies in a generated "binary cookie" in the beginning of the postscript documents. This "cookie" makes the postscript intepreter either select to show document 'A' or 'B'. The "thing" with the cookies are that they are carefully selected to be md5-colliding. Result: both documents have the same md5sum.

    You can change the rest of the documents freely if you make the same changes in both documents. The md5sum will change, but it will still be the same for both documents.

    So. No. It is indeed a md5 collission attack.
  • by Fnord666 ( 889225 ) on Friday June 10, 2005 @09:07PM (#12786071) Journal
    You, the employee, are going to argue that the company created both documents with matching hashes, then had you sign the more benign one. The company is going to argue that you agreed to the draconian version, then created the benign version at a later time that has a matching hash and are just trying to get out of the obligation. The problem is that:
    1. Both versions are possible
    2. Explaining any of this to people who haven't figured out how to get out of jury duty isn't going to happen. Ever seen the glazed looks on jury members when expert witnesses testify?

    If I understand correctly, what this boils down to is that given an document and an MD5 hash, there is now a "reasonable" time based method of generating a second document with different content but a matching hash.

    For a hash based signature, there will exist documents that have matching hashes. This is refered to as the pigeonhole principle. If you have 10 pigeonholes to stuff messages in and 11 messages, one of the pigeonholes will get a second message.

    The linch pin of this process is the idea that it takes too much time to find or create a second document that has different content but the same hash value. In practice you want it to take so long that by the time a match is found, it no longer matters. What "no longer matter" means depends on the context.

    When an adversary can create/find a match in a couple of hours rather than centuries, all bets are off unless the signature expires in seconds.

    This really is an important result and has significant implications.

  • by Anonymous Coward on Friday June 10, 2005 @11:06PM (#12786708)
    I can't believe that nobody, including Bruce Schneier, has pointed out the well understood principle that you *never* blindly sign something provided to you by someone else.

    Secure schemes involving one party signing material provided by another party should always involve a signing nonce. This is a random bitstring appended to the material to be signed. In the example from the article, if Caesar had added a nonce before signing the "good" document, then Alice would not have been able to produce a "bad" document with a matching hash.

    However, as someone pointed out, this kind of attack is possible without using hash collisions. You simply have to convince your victim to sign something that is code-like (such as the postscript files used in the example, or an HTML document with javascript, or a Word doc with Visual Basic macros, or an executable) without making an expert examination of the complete contents of the code. The code-like "document" can be written to present one set of behavior under one set of circumstances and a completely different behavior under other circumstances. If the signer is not competent to examine the program and be fully aware of what it does, then the signature can have no meaning.

    The attack demonstrated in the article is *still* of no practical consequence in my opinion. It doesn't mean that signed code can't be trusted; trusted code is always signed by it's creator, meaning that if the signature is valid *and* you trust the creator, then you can trust the code. It is still impossible for someone who didn't create the signed code to create a different piece of code that has the same hash, so you don't have to worry about an attacker replacing signed trusted code with signed malicious code. As always, you still have to worry about misplaced trust in the code's creator.

    What the attack in the article *should* be trying to make clear is that a signature is only as trustworthy as the signer's ability and desire to evaluate what is being signed. A third party auditor that signs code after reviewing it doesn't ensure that the code is trustworthy, it only ensures that the auditor *claims* the code is trustworthy. What the implications are for the actual trustworthiness of the code depends on the trust you place in the intentions and abilities of the auditor.
  • by dossen ( 306388 ) on Saturday June 11, 2005 @07:20PM (#12791474)
    It is actually not entirely true, that md5 (and sha1) does not account for the size of the file. Both md5 and sha1 use the same Merkle-Damgård structure, where the same function is applied to a running "total", initially an initialization vector, and a fixed-size block of the input. For both hashes, the blocks are 512 bits long, and the last block is padded and ends with the length of the file, as a 64-bit integer. So unless one of your files is extremely long, size is taken into account. Doesn't mean that size-altering is impossible, but the linked attack for one does not alter the size.

Stellar rays prove fibbing never pays. Embezzlement is another matter.

Working...