Meaningful MD5 Collisions - Slashdot

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

×

Meaningful MD5 Collisions 312

Posted by Zonk on Friday June 10, 2005 @03:38PM from the bam-crash-boom dept.

mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."

This discussion has been archived. No new comments can be posted.

Meaningful MD5 Collisions

Search 312 Comments Log In/Create an Account

Comments Filter:

Text of Letters (Score:3, Informative)

by pete-classic ( 75983 ) writes: <hutnick@gmail.com> on Friday June 10, 2005 @03:46PM (#12783172) Homepage Journal

For those who can't convientently view PostScript files, the text of the two letters:

Julius. Caesar
Via Appia 1
Rome, The Roman Empire
Alice Falbala fulfilled all the requirements of the Roman Empire
intern position. She was excellent at translating roman into her gaul
native language, learned very rapidly, and worked with considerable
independence and confidence.
Her basic work habits such as punctuality, interpersonal deportment,
communication skills, and completing assigned and self-determined
goals were all excellent.
I recommend Alice for challenging positions in which creativity,
reliability, and language skills are required.
I highly recommend hiring her. If you'd like to discuss her attributes
in more detail, please don't hesitate to contact me.
Sincerely,
Julius Caesar

Julius. Caesar
Via Appia 1
Rome, The Roman Empire
May, 22, 2005
Order:
Alice Falbala is given full access to all confidential and secret
information about GAUL.
Sincerely,
Julius Caesar

Share
twitter facebook
Re:Common sense (Score:2, Informative)

by ergo98 ( 9391 ) writes: on Friday June 10, 2005 @03:46PM (#12783177) Homepage Journal

and this isn't really a cryptographic discovery of hash function weakness of any kind

It's an example of the hash collision weaknesses recently documented, giving a practical example of how it could be used for malicious purposes.

Traditionally we haven't had to worry about nonsensical things like applying the hash only to easily verifiable English text, because the hash is supposed to practically protect against intentional searches for collisions.

Parent Share
twitter facebook
Explanation of the attack (Score:5, Informative)

by swillden ( 191260 ) * writes: <shawn-ds@willden.org> on Friday June 10, 2005 @03:46PM (#12783179) Journal

What these researchers did was not to improve the known attacks on MD5, but to demonstrate a clever way of turning the known attack, generally considered to be of theoretical interest only, into an attack that could potentially really be used.

The way they did it was to create a postscript document that actually contains two documents, one that the sender would be willing to sign and one that he presumably would not. The full text of both is contained in the file, but near the beginning of the file is a bit of code that compares two blocks of random-appearing bits, call them A and B. If A == B, the postscript interpreter will select the innocuous message and display that. If A != B, the interpreter will display the other message.

The researchers then generated a pair of blocks with the same MD5 hash. In one copy of the postscript file, they used one of these blocks as both A and B. In the other copy, they used one block as A and the other as B. Because every bit of both documents before and after the two blocks is identical, and because those blocks hash to the same value, the documents hash to the same value.

It's an interesting attack. It only applies to documents that are also programs, in some sense, but we use lots of document formats that fit that description.

A simple countermeasure that makes such an attack more difficult is to compress the documents before signing.

Share
twitter facebook
Re:Wow...this is nerdy even for /. (Score:3, Informative)

by William_Lee ( 834197 ) writes: on Friday June 10, 2005 @03:46PM (#12783181)

Thanks for clearing that up. Now it all makes perfect sense!

Parent Share
twitter facebook
Re:What are the alternatives? (Score:4, Informative)

by specialbrad ( 884393 ) writes: on Friday June 10, 2005 @03:52PM (#12783243)

The signing of open-source packages are to prevent download corruption usually. If a download is corrupted, the data will be different, and hence the hash will be different. Most of these attacks are malicious in that you have to go great lengths to find a collision to use. If your connection corrupts the download in such a way to produce a collision, your modem obviously hates you.

Parent Share
twitter facebook
7 bits difference (Score:3, Informative)

by lavalyn ( 649886 ) writes: on Friday June 10, 2005 @03:55PM (#12783282) Homepage Journal

Compare the two files, and you see that there were only 7 bits that changed, all on the 5th most significant bit of the affected byte.

I guess it's obvious from looking at the ps file in text form, but if it's that easy to mangle postscript to display two different layers (or is it changing comments or pointers? I am not a binary postscript parser.) I still don't think it's time to throw md5 away yet. Six months.

What other hashing alternatives are there these days? SHA-1 apparently has the same kinds of weaknesses.

Share
twitter facebook
Re:Wow...this is nerdy even for /. (Score:4, Informative)

by jjares ( 141954 ) writes: on Friday June 10, 2005 @03:56PM (#12783296) Homepage

Basically, when you do an md5 for a string, you transform an existing text with a variable length to a fixed length string. Now, imagine the variable text is 200bytes long, but the fixed string is 20 bytes long, you are obiously loosing information, and that there may be a combination of 200 bytes that produce the same 20 byte sequence, but the amount of combinations in 20 bytes (160 bits) make it highly unlikely that you will find a repeated sequence. What this investingators found is a way to replicate this sequences. The problem being that usually we check integrity with this md5 hashes, so teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one. This is specially nice for putting backdoors in source code downloaded from the net, as we often check it against an md5 hash.

Parent Share
twitter facebook
Re:What are the alternatives? (Score:3, Informative)

by rbarreira ( 836272 ) writes: on Friday June 10, 2005 @04:03PM (#12783372) Homepage

I may be wrong but I think that for that purpose, the use of MD5 is still quite secure. What those researchers did was make 2 files with the same MD5, they didn't choose the md5 value itself. In order to crack the schemes you're mentioning the md5 value is a given value for which you want to generate another file (many times with the additional restriction that the file sizes must match).

Read about collision attacks versus preimage attacks here [cryptography.com].

Unless you're assuming that at least one of the people responsible for redistributing the software have bad intentions?

Parent Share
twitter facebook
Consider it to be an example... (Score:3, Informative)

by Otto ( 17870 ) writes: on Friday June 10, 2005 @04:03PM (#12783384) Homepage Journal

If you don't know what a hash function is, then turn in your nerd credentials now.

Basically, they provided an example case where one of these recent methods to generate hash function collisions can be turned into a "real world" attack.

It's a very simple example case, but it demonstrates the point effectively. The point is that these recently discovered methods to generate collisions quickly are a real threat to any software using them as a method for digital signatures and such.

The real world application here is that it is possible, probably in several good ways, to generate a couple of different files that have the same hash and also have meaningful data in them. The attacks found that generate seemingly random data with the same hashes can be used in ways that will let them apply to non-random, purposefully designed data.

The example they use is where some secretary gets her boss to sign a document, and then uses his signature on another document which gives her access she shouldn't have. It's a way to forge a digital signature on a document by having them sign another one that you specially crafted.

Parent Share
twitter facebook
Re:Security Through Obscurity (Score:4, Informative)

by Dachannien ( 617929 ) writes: on Friday June 10, 2005 @04:05PM (#12783403)

We're talking about cryptographic hashes here, not encryption. Encryption is meant to be a reversible process, and is therefore one-to-one. In other words, there's no concern over collisions with encryption.

With cryptographic hashes, you're throwing away nearly all of the data to obtain a hash (a number) which represents the larger data set in such a way that (hopefully) the hash will never turn up again in practical usage. The article here indicates that there are ways being devised to force two data sets to have a hash collision while keeping the practical parts of the data sets the same.

As for accusing encryption of being "security through obscurity", you're misusing that term. If knowing the encryption algorithm allowed you instant access to all data encrypted with that algorithm, then yes, the only security present would be dependent upon the secrecy of the algorithm itself. But that's not the case here. Encryption typically works by public key exchange, meaning that a key (a number) used to encrypt messages is shared with the encrypting partner, while the key to decrypt and recover the data is kept private (is never transmitted). Recovering the private key through brute force is not a compromise of the algorithm itself - given enough time, any private key can be recovered, regardless of the algorithm, but by increasing the key size arbitrarily, the time taken to find that key can also be increased arbitrarily.

Parent Share
twitter facebook
Re:Security Through Obscurity (Score:2, Informative)

by MyTwoCentsWorth ( 593731 ) writes: on Friday June 10, 2005 @04:09PM (#12783450)

Dear me,

Your ignorance is so appalling that I have a hard time deciding where to start.

First - this is about digital signatures, not about encryption - sharing the documents (by sending them over electronic media or otherwise) is needed.

Second - obfuscated algorithms ??? Open any book on encryption to learn that the strongest encryption algorhitms rely on completely documented algorhitms - security through obscurity is so 80's...

My suggestion - try to have something worth saying before making suggestions.

Happy Posting.

Parent Share
twitter facebook
Re:These are important attacks.. (Score:3, Informative)

by pz ( 113803 ) writes: on Friday June 10, 2005 @04:09PM (#12783458) Journal

I now make sure the company IT system "forget" the first document and I've successfully screwed you.

Not quite, beacuse of reasonable doubt and the fact that the hypothetical employee would have copies of the document.

However, Alice can get Bob to sign an innocuous recommendation letter that in the hidden version is a power of attorney for Bob's bank accounts. Alice can then take the fraudulent letter to Bob's bank and with apparent legality, take all of his money. (The difference being that a third party is involved who is naive to the document Bob signed and his intents.) This is effectively the scenario suggested in TFA, and because there is no alternate version of the document, Bob has no recourse (unlike the parent poster's suggestion).

To exploit this hole in MD5 (SHA-1, or whichever hashing function you like), you need to create two completely different documents, not two which are different versions of the same thing and can be compared by humans. That's the real hole: these documents need to be printed out (or rendered on a screen) and interpreted by humans who assume that the rendering process is trustworthy because it is complete and veracious. If humans natively read PostScript with the same ease we read English, for example, this kind of exploit wouldn't be possible.

Parent Share
twitter facebook
Re:Kids in Finland don't agree (Score:1, Informative)

by Anonymous Coward writes: on Friday June 10, 2005 @04:10PM (#12783474)

Nice referrer link.

Parent Share
twitter facebook
Re:These are important attacks.. (Score:3, Informative)

by cpeikert ( 9457 ) writes: <cpeikert AT alum DOT mit DOT edu> on Friday June 10, 2005 @04:22PM (#12783611) Homepage

How exactly could they create junk documents that also match the expected filesize.

The collisions that have been found for MD5 are for pairs of documents that are the same size. The size constraint is not a problem.

Parent Share
twitter facebook
Re:big oops (Score:2, Informative)

by tota ( 139982 ) writes: on Friday June 10, 2005 @04:22PM (#12783623) Homepage

sha-256 and later have not been found to be weak yet. It does not mean that they are unbreakable (or I should say: possible to derive collisions) but it is definitely better.

Another solution that should work quite well is to combine hashes: (md5+sha1) is definitely much stronger as you would need to find a collision that works for both algorithms at the same time.
Possible, but not likely.

Parent Share
twitter facebook
Re:Security Through Obscurity (Score:3, Informative)

by loose_cannon_gamer ( 857933 ) writes: on Friday June 10, 2005 @04:23PM (#12783632)

I think that's a big shortsighted... I agree that if we let history take a crack at it, that any encryption put together by smart people will eventually be breakable by smart people.

However, most data that I deal with day-to-day is time relevant. Do I care if someone figures out my credit card number on an account I closed 5 years ago? Is it terrible if someone hacks an old email only to find out I was begging a professor for a passing grade in 1997?

Encryption is meant to hide things, and for many things, the need to hide is temporary. If the hidden thing stays hidden as long as it needs to stay hidden, there is nothing wrong with it.

Know the limitations on the technology you use, and know the parties with which you exchange information. Those two rules alone, if followed, will probably provide more than adequate real-world defense. Perfect? No. Good enough. Statistically, yes.

Parent Share
twitter facebook
Re:These are important attacks.. (Score:4, Informative)

by deanoaz ( 843940 ) writes: on Friday June 10, 2005 @04:23PM (#12783634)

If two documents exist with the same hash, then they were both produced by the same source, since there is no practical, known way of finding a collision without having control of the content of both documents. Therefore, your signed copy of the original document proves that the employer created both versions.

Parent Share
twitter facebook
Re:Okay, I'm impressed. (Score:4, Informative)

by yeremein ( 678037 ) writes: on Friday June 10, 2005 @04:55PM (#12784019)

To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide.

That's actually not what they did. They generated two essentially random blobs of data that have the same hash. We'll call these X and Y. They then created a PostScript document containing BOTH messages, the one that Alice's boss would sign and the one he presumably would not sign. They inserted two copies of block X into one of these documents, and a one X and one Y into the other. The original document contained code that compared the two blocks, and if they were the same, caused one message to be rendered, or if they were different, caused the other message to be rendered. Thus both documents hash the same (since X hashes the same as Y), but you see different text when you view the files.

This sort of attack would only work on documents that can contain code of some sort. It would not work on text files.

Parent Share
twitter facebook
Size matters (even now) (Score:3, Informative)

by Beryllium Sphere(tm) ( 193358 ) writes: on Friday June 10, 2005 @04:57PM (#12784054) Journal

>Switching to larger keys will not change anything.

Switching to longer hash sizes (not keys) will help against the current generation of SHA attacks. The effect of the Wang attacks is to knock (if memory serves) 16 bits off the work factor to find a collision. In other words, finding a collision in SHA-1 is now 64K times easier than brute force. Make brute force harder and the attack gets harder.

Some numbers: SHA-1 produces a 160-bit hash. If you're trying to find a collision as oposed to generating one the birthday paradox works in your favor and you only need on average 2^80 trials for brute force. Wang has lowered that number to 2^64, which is uncomfortably small by today's standards and reckless by tomorrow's. SHA-256 produces a 256-bit hash: you do the math.

The big worry is that cryptographic attacks tend to inspire research that finds better attacks. There's an unquantifiable but real risk that SHA-1 could collapse altogether.

The crypto geeks are speulating that these weaknesses are inherent in any hash algorithm that can be calculated incrementally. Right now the output after block N of input depends only on blocks 1..N of input. If the intermediate result for block N depended on blocks N+1..EOF then the current attacks wouldn't work. Neither would hashing an input stream. Such a change would destroy the ability to hash a file in parallel with reading it.

Parent Share
twitter facebook
Re:Possible solution (Score:2, Informative)

by orb_fan ( 677056 ) writes: on Friday June 10, 2005 @04:59PM (#12784087)

You are assuming that I read hashing alogrithm threads, which I don't. If there is a reason why this wouldn't work, then please enlighten me.

Parent Share
twitter facebook
Re:Just use multiple hash methods (Score:2, Informative)

by skelly33 ( 891182 ) writes: on Friday June 10, 2005 @05:20PM (#12784321)

"It is probably not possible to get a pair of files that will collide in all three of those hashes."

Not impossible, just less probable. Combining algorithms effectively increases the size of your fingerprint. The probability of a duplicate is a function of original data size and its proportion to the fingerprint data size. The closer those two data sizes are, the lower the probability of duplicates.

It may seem difficult to pull off now, but eventually someone will make a nice, clever script to do it and then the script-kiddies will have a field day.

Parent Share
twitter facebook
This has a solution (Score:2, Informative)

by yfarjoun ( 878821 ) writes: on Friday June 10, 2005 @05:21PM (#12784324) Homepage

This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".
See http://en.wikipedia.org/wiki/Birthday_paradox [wikipedia.org] for details.

Basically if you have enough documents you will be able to find two with the same hash. Not to be mistaken with the difficulty of finding a document with the same hash as a GIVEN document. This cannot be avoided regardless of the hash.

To foil a birthday attack, NEVER agree to sign a given document. One must ALWAYS add some random junk to the document (clear text or hidden). This should be implemented in sofetware that helps in signing, but if the option isn't given, you can add a clear text "random" string.

Yossi

Share
twitter facebook
Re:Common sense (Score:3, Informative)

by Anonymous Coward writes: on Friday June 10, 2005 @05:51PM (#12784663)

Isn't this obivous if you check the file sizes?

The files are the same size.

The cksum comand (which uses a 32 bit CRC) spits out the checksum and a file size. Why doesn't md5sum do the same thing?

It does - The file size is used as part of the MD5 hash. The MD5 algorithm hashes the file, then appends the file size and hashes that too. If it didn't do this then you could create an MD5 collision just by appending zeros to the file.

Parent Share
twitter facebook
Re:These are important attacks.. (Score:3, Informative)

by ars ( 79600 ) writes: <assd2@d s g m l .com> on Friday June 10, 2005 @06:50PM (#12785259) Homepage

Every time this comes up people mention the same idea - use two hashes for security!

What people don't realize is that MD5 _IS_ two hashes already! That's how it works!

To make it worse the hashes you mentioned MD5 and SHA-1 are based on exactly the same algorithm, so using it twice doesn't help you much.

Parent Share
twitter facebook
Re:The provided exploit documents can be edited! (Score:5, Informative)

by rar ( 110454 ) writes: on Friday June 10, 2005 @07:08PM (#12785423) Homepage

It is the same document, just relying on differences in the document name (it appears) to generate the different pages.

No, you have missed the point. Go back and rtfa again. The attack still works if you rename the documents to the same filename.

The difference lies in a generated "binary cookie" in the beginning of the postscript documents. This "cookie" makes the postscript intepreter either select to show document 'A' or 'B'. The "thing" with the cookies are that they are carefully selected to be md5-colliding. Result: both documents have the same md5sum.

You can change the rest of the documents freely if you make the same changes in both documents. The md5sum will change, but it will still be the same for both documents.

So. No. It is indeed a md5 collission attack.

Parent Share
twitter facebook
Re:These are important attacks.. (Score:2, Informative)

by Fnord666 ( 889225 ) writes: on Friday June 10, 2005 @09:07PM (#12786071) Journal
You, the employee, are going to argue that the company created both documents with matching hashes, then had you sign the more benign one. The company is going to argue that you agreed to the draconian version, then created the benign version at a later time that has a matching hash and are just trying to get out of the obligation. The problem is that:
1. Both versions are possible
2. Explaining any of this to people who haven't figured out how to get out of jury duty isn't going to happen. Ever seen the glazed looks on jury members when expert witnesses testify?
If I understand correctly, what this boils down to is that given an document and an MD5 hash, there is now a "reasonable" time based method of generating a second document with different content but a matching hash.

For a hash based signature, there will exist documents that have matching hashes. This is refered to as the pigeonhole principle. If you have 10 pigeonholes to stuff messages in and 11 messages, one of the pigeonholes will get a second message.

The linch pin of this process is the idea that it takes too much time to find or create a second document that has different content but the same hash value. In practice you want it to take so long that by the time a match is found, it no longer matters. What "no longer matter" means depends on the context.

When an adversary can create/find a match in a couple of hours rather than centuries, all bets are off unless the signature expires in seconds.

This really is an important result and has significant implications.
Parent Share
twitter facebook
Nonces And Blind Signatures (Score:2, Informative)

by Anonymous Coward writes: on Friday June 10, 2005 @11:06PM (#12786708)

I can't believe that nobody, including Bruce Schneier, has pointed out the well understood principle that you *never* blindly sign something provided to you by someone else.

Secure schemes involving one party signing material provided by another party should always involve a signing nonce. This is a random bitstring appended to the material to be signed. In the example from the article, if Caesar had added a nonce before signing the "good" document, then Alice would not have been able to produce a "bad" document with a matching hash.

However, as someone pointed out, this kind of attack is possible without using hash collisions. You simply have to convince your victim to sign something that is code-like (such as the postscript files used in the example, or an HTML document with javascript, or a Word doc with Visual Basic macros, or an executable) without making an expert examination of the complete contents of the code. The code-like "document" can be written to present one set of behavior under one set of circumstances and a completely different behavior under other circumstances. If the signer is not competent to examine the program and be fully aware of what it does, then the signature can have no meaning.

The attack demonstrated in the article is *still* of no practical consequence in my opinion. It doesn't mean that signed code can't be trusted; trusted code is always signed by it's creator, meaning that if the signature is valid *and* you trust the creator, then you can trust the code. It is still impossible for someone who didn't create the signed code to create a different piece of code that has the same hash, so you don't have to worry about an attacker replacing signed trusted code with signed malicious code. As always, you still have to worry about misplaced trust in the code's creator.

What the attack in the article *should* be trying to make clear is that a signature is only as trustworthy as the signer's ability and desire to evaluate what is being signed. A third party auditor that signs code after reviewing it doesn't ensure that the code is trustworthy, it only ensures that the auditor *claims* the code is trustworthy. What the implications are for the actual trustworthiness of the code depends on the trust you place in the intentions and abilities of the auditor.

Share
twitter facebook
Re:Wow...this is nerdy even for /. (Score:3, Informative)

by dossen ( 306388 ) writes: on Saturday June 11, 2005 @07:20PM (#12791474)

It is actually not entirely true, that md5 (and sha1) does not account for the size of the file. Both md5 and sha1 use the same Merkle-Damgård structure, where the same function is applied to a running "total", initially an initialization vector, and a fixed-size block of the input. For both hashes, the blocks are 512 bits long, and the last block is padded and ends with the length of the file, as a 64-bit integer. So unless one of your files is extremely long, size is taken into account. Doesn't mean that size-altering is impossible, but the linked attack for one does not alter the size.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

390 comments32-Hour Workweek for America Proposed by Senator Bernie Sanders
358 commentsWhat Should Happen to Empty Downtown Office Spaces?
340 commentsHacktivism Erupts In Response To Hamas-Israel War
324 comments'Feedback' Is Now Too Harsh. The New Word is 'Feedforward'
248 commentsWorkers are Resisting Calls to Return to Offices

Stellar rays prove fibbing never pays. Embezzlement is another matter.