Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Meaningful MD5 Collisions

Posted by Zonk on Fri Jun 10, 2005 02:38 PM
from the bam-crash-boom dept.
mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."

Related Stories

[+] SHA-1 Collisions for Meaningful Messages 128 comments
mrogers writes "Following on the heels of last year's collision search attack against SHA-1, researchers at the Crypto 2006 conference have announced a new attack that allows the attacker to choose part of the colliding messages. "Using the new method, it is possible, for example, to produce two HTML documents with a long nonsense part after the closing </html> tag, which, despite slight differences in the HTML part, thanks to the adapted appendage have the same hash value." A similar attack against MD5 was announced last year."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Common sense (Score:3, Insightful)

    by gtrubetskoy (734033) * on Friday June 10 2005, @02:40PM (#12783106)
    (http://www.openhosting.com/)

    Unless I'm missing something, all these guys are doing is using a format that can contain an infinite amount of extraneous information that has no effect on how it's ultimately rendered, so same thing can be done with a .doc, or an HTML file, and this isn't really a cryptographic discovery of hash function weakness of any kind, just common sense for most of us - the secure hash algorithms should be applied to the English (or whatever language) textual contents of the document, no the source code of it, such as PostScript used in the article, PDF, HTML or whatever. I guess the most important lesson here is that this technique can be applied to binaries pretty easily as well.
    • no help by LiquidCoooled (Score:2) Friday June 10 2005, @02:45PM
      • Re:no help by ergo98 (Score:1) Friday June 10 2005, @02:49PM
      • Re:no help by linuxhansl (Score:2) Friday June 10 2005, @04:24PM
        • Re:no help by Loconut1389 (Score:2) Friday June 10 2005, @04:50PM
          • Re:no help by Loconut1389 (Score:1) Friday June 10 2005, @04:53PM
          • Re:no help by DuckofDeath87 (Score:1) Friday June 10 2005, @05:15PM
    • Re:Common sense by ergo98 (Score:2) Friday June 10 2005, @02:46PM
    • Re:Common sense by 3770 (Score:2) Friday June 10 2005, @02:48PM
    • Re:Common sense by loose_cannon_gamer (Score:2) Friday June 10 2005, @03:09PM
    • On the contrary... (Score:5, Insightful)

      by Chris Pimlott (16212) on Friday June 10 2005, @03:14PM (#12783520)
      This attack shows us all once again that there is that the procedures for using cryptography are as important as the mathematical theories and proofs on which cryptography is based. People like to believe that it's just the algorithm that's important, and once you have such an algorithm it's equally applicable to messages of all sorts and formats. As this shows, it's clearly not the case.

      You may believe it's common sense, but to the average user, encrypting a simple letter like the memos used in the article expressed as a Word document is no different than encrypting a simple text email. Heck, many of these users probably have no idea that much of the plain-looking email they send and recieve is actually HTML, which is capable of hiding beneath its rendered surface all sorts of additional information.

      When's the last time you saw an email program that read in a Word document, extracted just the plain text content, signed or encrypted it and then repackaged it into some new format in a cryptographically sound way that would automatically be reconstituted as a Word document on the other side? Most just have a handy "Sign" or "Encrypt" button that will happy accept .ps or .doc just as readily as a simple text file.
      [ Parent ]
    • If you read the Slides link for the Eurocrypt 2005 by Calyth (Score:1) Friday June 10 2005, @03:27PM
    • Re:Common sense (Score:4, Interesting)

      by iabervon (1971) on Friday June 10 2005, @03:30PM (#12783718)
      (http://iabervon.org/~barkalow/ | Last Journal: Saturday May 31 2003, @02:01AM)
      Actually, the two documents are actually almost identical. The difference is only one block in the whole file, which essentially acts as a selector for which of the two sets of content is displayed. MD5 (like most hash functions) works on fixed-size blocks smaller than the average file. To hash a complete file, you hash the first block, feed that into the hash with the second block, feed that into the hash with the third block, and so forth. So they have two files, and the first blocks are the same, the second blocks are different but hash the same, and the rest of the files are the same. Of course, the second blocks are junk, but the postscript is expecting a block-sized arbitrary value at that point anyway, so it doesn't matter that there's junk there.

      So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.

      The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).

      [ Parent ]
    • Re:Common sense by m50d (Score:2) Friday June 10 2005, @04:27PM
    • Re:Common sense by ahdeoz (Score:1) Friday June 10 2005, @05:01PM
    • Re:Common sense by Anonymous Coward (Score:3) Friday June 10 2005, @04:51PM
    • 1 reply beneath your current threshold.
  • by William_Lee (834197) on Friday June 10 2005, @02:42PM (#12783123)
    Anyone want to explain the real world applications of this to someone who is considering turning in his nerd credentials after being unable to get the gist of this from the write up...and please don't tell me to RTFA, this is after all /.!
  • These are important attacks.. (Score:5, Insightful)

    by Ckwop (707653) * <Simon.Johnson@gmail.com> on Friday June 10 2005, @02:42PM (#12783126)
    (http://www.ckwop.me.uk/)
    As an amateur cryptographer, I must say that labeling these attacks as 'practically irrelevant'
    is at the very least misguided and at worst a shocking display of incompetence.

    Stop the fixation with plain-text messages, most messages are not plain-text. Your average word document
    contains loads of invisible data that doesn't get rendered. Pdf's contain "junk" data that doesn't get rendered either. Would
    you notice a single bit difference in an MP3? Or a single pixel colour change in a jpeg? Hell, you can even do it in HTML <div style="visibility:hidden">Junk goes here</div>.
    Mark my words, people will find in the next couple of months find two meaningful computer
    documents that hash to the same value but are different byte-wise.

    People undervalue these attacks because the attacker has to generate the collision before hand to use it.
    To properly appreciate the power of these attacks consider the following senario.

    Imagine we're agreeing a contract of employement and I'm your employer.
    I give you the first word document that includes all the standard terms, however, I've also drafted
    a Word document that contains a load of draconian clauses like banning you from working in any IT position five years
    after leaving the company. By adding junk that doesn't render to both documents, I've managed to find to make the hash
    of the two documents collide. Thinking I'm a nice employer, you sign the first document, which you do by signing the hash of
    document. However, I now have your signature on BOTH documents. I now make sure the company IT system "forget" the first document
    and I've successfully screwed you.

    This is a human example, but there are other examples that apply in computer systems. The problem is that in many situations
    the attacker can choose when you encrypt. Say you encrypt your e-mail conversation with your friend using S/MIME, many people click
    "Reply" and the message body of the other persons method appears in the new message. Because of these attacks,
    It's now no certainty that an attacker couldn't use this fact to construct collisions that an attacker could use.

    As another security researcher said (paraphrased) It's like you're in building and you've just heard the fire alarm go off.
    You can't see smoke but it's time to make your way calmly to the exit. That sums up the position with SHA-1 and MD5. Swap out the primitives
    before you start seeing smoke.

    It's not like we don't have alternatives anyway. Whirlpool uses the same wide-trail design principles has AES. It's slower than MD-5 or SHA-1 but it's much better designed. And beside, people would do well to realise you have to spend CPU cycles to get security.

    Simon.
  • by huckda (398277) on Friday June 10 2005, @02:42PM (#12783135)
    (Last Journal: Sunday January 21 2007, @06:32PM)
    - but many so-called 'practitioners' turned them down as 'practically irrelevant'."
  • See Schneier's Blog [schneier.com] for more thoughts on the subject. I am sure it will get fleshed out more as more details emerge.
  • What are the alternatives? (Score:2, Insightful)

    by CyricZ (887944) on Friday June 10 2005, @02:45PM (#12783169)
    If MD5 is found to be insecure, what are the alternatives we can use when signing our open-source packages? Is there any other alternative that is even approaching the widespread use of md5sum?
  • Text of Letters (Score:3, Informative)

    by pete-classic (75983) <hutnick@gmail.com> on Friday June 10 2005, @02:46PM (#12783172)
    (http://hutnick.com/ | Last Journal: Monday March 12 2007, @09:15PM)
    For those who can't convientently view PostScript files, the text of the two letters:

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    Alice Falbala fulfilled all the requirements of the Roman Empire
    intern position. She was excellent at translating roman into her gaul
    native language, learned very rapidly, and worked with considerable
    independence and confidence.
    Her basic work habits such as punctuality, interpersonal deportment,
    communication skills, and completing assigned and self-determined
    goals were all excellent.
    I recommend Alice for challenging positions in which creativity,
    reliability, and language skills are required.
    I highly recommend hiring her. If you'd like to discuss her attributes
    in more detail, please don't hesitate to contact me.
    Sincerely,
    Julius Caesar

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    May, 22, 2005
    Order:
    Alice Falbala is given full access to all confidential and secret
    information about GAUL.
    Sincerely,
    Julius Caesar
  • What these researchers did was not to improve the known attacks on MD5, but to demonstrate a clever way of turning the known attack, generally considered to be of theoretical interest only, into an attack that could potentially really be used.

    The way they did it was to create a postscript document that actually contains two documents, one that the sender would be willing to sign and one that he presumably would not. The full text of both is contained in the file, but near the beginning of the file is a bit of code that compares two blocks of random-appearing bits, call them A and B. If A == B, the postscript interpreter will select the innocuous message and display that. If A != B, the interpreter will display the other message.

    The researchers then generated a pair of blocks with the same MD5 hash. In one copy of the postscript file, they used one of these blocks as both A and B. In the other copy, they used one block as A and the other as B. Because every bit of both documents before and after the two blocks is identical, and because those blocks hash to the same value, the documents hash to the same value.

    It's an interesting attack. It only applies to documents that are also programs, in some sense, but we use lots of document formats that fit that description.

    A simple countermeasure that makes such an attack more difficult is to compress the documents before signing.

  • by amalcon (472105) on Friday June 10 2005, @02:50PM (#12783219)
    Maybe now, people will finally start migrating to a different hash algorithm for signatures. MD5 has been around long enough that people have known this sort of thing is possible for some time. It's only now that they finally sat down and did the proof-of-concept.

    Of course, MD5 passwords are probably still safe, for now (between maximum password lengths and the fact that this attack will have a hard time actually doing that), but it's only a matter of time.
  • Kids in Finland don't agree (Score:5, Insightful)

    by StreetFire.net (850652) on Friday June 10 2005, @02:52PM (#12783248)
    (http://www.vidiac.com/)
    Regarding being "practically irrelevant"

    "every time [some software engineer] says, 'nobody will go to the trouble of doing that,' there's some kid in Finland who will go to the trouble."
    Taken from Kevin' Mitnik's "The Art of intrusion"
    http://www.amazon.com/exec/obidos/tg/sim-explorer/ explore-items/-/0764569597/0/101/1/none/purchase/r ef%3Dpd_sxp_r0/104-8074733-7395136 [amazon.com]
  • Possible solution (Score:2, Insightful)

    by orb_fan (677056) on Friday June 10 2005, @02:55PM (#12783280)

    Is the answer then to create a hash that is in fact the sum of two distinct hashing alogrithms? For example, use MD5 and SHA1. Since the alogrithms use different methods, the set of collisions from one would not overlap the set from the second (or rather the overlap would be vanishingly small). And if the overlap was larger than you wanted - just keep adding different hash alogrithms until you are satisfied.

    I realise that the computations involved aren't cheap, but it becomes a trade-off between security and speed - the more sure, the more time it will take.

    • Weak + weak = weak by MarkByers (Score:2) Friday June 10 2005, @03:42PM
    • Re:Possible solution by eli867 (Score:1) Friday June 10 2005, @03:43PM
    • Re:Possible solution by m50d (Score:1) Friday June 10 2005, @03:53PM
    • That won't solve it (Score:4, Insightful)

      by gotr00t (563828) on Friday June 10 2005, @04:14PM (#12784252)
      (Last Journal: Saturday December 07 2002, @12:34AM)
      Think of it this way: You can hash _any_ file of _any_ size using either MD5 or SHA1, and these algorithms will hive you an alphanumeric hash, which has a limited number of permutations. Thare are an infinate number of unique files (assuming adequate storage space), yet there are finite unique hashes.

      Therefore, no matter how many algorithms you sum up using your described method, the number of collisions is still infinite in amount. It is not the algorithms that are flawed, rather, it is the fundamential concept of hashing that allows collisions to happen.

      I would assume that the way to reduce the number of collisions is by increasing the length of the hash itself so as to increase the number of unique hashes.

      [ Parent ]
    • Re:Possible solution by dragondm (Score:1) Friday June 10 2005, @06:04PM
    • 1 reply beneath your current threshold.
  • 7 bits difference (Score:3, Informative)

    by lavalyn (649886) on Friday June 10 2005, @02:55PM (#12783282)
    (http://127.0.0.1/ | Last Journal: Wednesday March 31 2004, @01:41PM)
    Compare the two files, and you see that there were only 7 bits that changed, all on the 5th most significant bit of the affected byte.

    I guess it's obvious from looking at the ps file in text form, but if it's that easy to mangle postscript to display two different layers (or is it changing comments or pointers? I am not a binary postscript parser.) I still don't think it's time to throw md5 away yet. Six months.

    What other hashing alternatives are there these days? SHA-1 apparently has the same kinds of weaknesses.
  • We could just couple it with another widely used industry standard [rot13.com]
  • Okay, I'm impressed. (Score:5, Interesting)

    by ave19 (149657) on Friday June 10 2005, @03:01PM (#12783348)
    At first I thought: Postscript! Well, obviously. To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide. If you tried to hide random data in a text file, it would be obvious to the person signing it. You need some format to hide the random bits from the viwer.

    I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

    Then I downloaded the files...

    There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.

    Collisions must be very easy to find! I am now offically very worried about this.
    • Re:Okay, I'm impressed. by gtrubetskoy (Score:2) Friday June 10 2005, @03:13PM
    • Re:Okay, I'm impressed. (Score:4, Informative)

      by yeremein (678037) on Friday June 10 2005, @03:55PM (#12784019)
      To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide.

      That's actually not what they did. They generated two essentially random blobs of data that have the same hash. We'll call these X and Y. They then created a PostScript document containing BOTH messages, the one that Alice's boss would sign and the one he presumably would not sign. They inserted two copies of block X into one of these documents, and a one X and one Y into the other. The original document contained code that compared the two blocks, and if they were the same, caused one message to be rendered, or if they were different, caused the other message to be rendered. Thus both documents hash the same (since X hashes the same as Y), but you see different text when you view the files.

      This sort of attack would only work on documents that can contain code of some sort. It would not work on text files.

      [ Parent ]
    • 1 reply beneath your current threshold.
  • by malcomvetter (851474) on Friday June 10 2005, @03:01PM (#12783351)


    in my next big project?

    In all seriousness, I believe Schneier's right. We need a competition for a new hash function [schneier.com].

    Nah, let's just wait for 24 [techweb.com] to drop the words "MD5" before we know it's really bad.
  • by G4from128k (686170) on Friday June 10 2005, @03:02PM (#12783357)
    The core problem is that the resources of the defender are inevitably overwhelmed by the resources of the attacker.
    1. Relative expected values of gian vs. loss: The attacker thinks "I know I can gain a #BIG_NUM million dollars" and devotes their full effort to the attack. The defender thinks "I'm safe, there's a low probability, and I'm sure I'll catch the problem before it becomes real money, " and does not not devote effort to security becuase a Gartner report told him it was over-hyped. Thus, the attacker's perceived expected value is much higher than the defenders perceived expected loss and each invests accordingly.
    2. Rising Complexity: As IT systems become more complex, they become less secure. Each new device, new networking protocol, new physical layer, new OS feature, new networked application provides new opportunities for the attacker and a dilution of security resources for defenders.
    3. Time: The attacker has the advantage of time. New algorithms, new mathematical theories, new exploits, and faster processors all favor the attacker. What once was supposed to take the age of the universe to crack can be decrypted with a quickly declining number of networked (even zombied) PCs.
    4. Curse of Compatibility: Because so much crypto and security is networking related, it is subject to implementation delays caused by the need to be compatible. Defenders continue to use old, vulnerable systems to maintain compatibility with key partners. Patches don't solve the problem because the patch itself can introduce incompatibilities that make defenders leary of applying the patch with a very real chance of causing problems to avoid a hypothetical security issue.
    The bottom line is that the defender must protect all vulnerabilities while going about the day-to-day business of using the computer. In contrast, the attacker can devote full time to any weakness of their choice.
  • From a prior discussion... (Score:4, Interesting)

    by erroneus (253617) on Friday June 10 2005, @03:03PM (#12783390)
    (http://slashdot.org/)
    I forget where or when exactly, so please feel free to run a search if you care to... it was here on Slashdot though.

    There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.

    Am I missing something or are these two stories relevant to each other?
  • Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

    Here's a link to the paper: Lenstra et al [iacr.org].
  • I hope all those who said the previous MD5 attacks were nothing to worry about will take it back. A theoretical break or partial break is almost always followed by a practical break. The lesson to learn from this is not to do the same thing with SHA1 and, I would suggest, AES. The vulnerabilities might be impractical now but they won't stay that way. Move now, while you're still at least partially secure, rather than once there's a practical break.
    • 1 reply beneath your current threshold.
  • [summary] (Score:1)

    by 823723423 (826403) on Friday June 10 2005, @03:10PM (#12783464)
    [1]
    [LdW] A. Lenstra, B. de Weger: On the possibility of constructing meaningful hash collisions for public keys
    http://www.win.tue.nl/~bdeweger/CollidingCertifica tes/ddl-full.pdf [win.tue.nl]

    [2]
    Using the length extension property present in MD5 and all other hash functions following the (almost omnipresent) Merkle-Damgaard design principle, we constructed a pair of documents to display either the letter or the order.
  • by sam5550 (841429) on Friday June 10 2005, @03:18PM (#12783564)
    Other attacks on MD5 [schneier.com]
  • by rar (110454) on Friday June 10 2005, @03:19PM (#12783576)
    (http://www.anytist.com/)
    What I haven't seen mentioned yet, and people perhaps haven't realized, is that in providing these two postscript files, they have essentially provided you with an postscript signature exploit kit!

    All you need to do is download the two postscript documents and do *exactly corresponding edits* in both of them, and you get two documents saying different things and still have the same md5sums!

    I just tried exchanging Alice's name for my own, and surely it did work.

    Now, if they released a pdf-file hack, I would be genuinely worried :)...
  • last time we had this discussion , maybe 3-6 months ago.. and they were meaningless collissions.. i told everyone "give it another 6 months or so, and someone'll figure out how to spoof properly"
  • Misuse of MD5? (Score:1)

    by skelly33 (891182) on Friday June 10 2005, @04:04PM (#12784144)
    I never thought MD5 was supposed to be used to authenticate the content of a file, only as a quick check for simple bit errors. I'm not buying any of this discussion about coming up with a new hashing algorithm. MD5 works great when it is used the way it is supposed to be. The trouble with coming up with something better is that you are trying to represent a large body of data with a smaller body of data. The ONLY situation in which your smaller body of data will absolutely be uniquely coupled with the original large body of data is in the case of lossless data compression such as LZW. Any other form of lossy data compression or tokenization will run the risk of duplicates matching the same fingerprint. Someone who comes up with a magical fingerprint that can be tied to one and only one large body of data will have discovered the Holy Grail of data compression.
  • by matthewmok (412065) on Friday June 10 2005, @04:07PM (#12784174)
    Tripwire does this. They use CRC32 and MD5. You could use MD5 and SHA and CRC32 if you wanted. It is probably not possible to get a pair of files that will collide in all three of those hashes.
  • Apply digital signatures to X not H(X). So what if sizeof(signature)=toobigtohandle, it's better than validityof(signature)=uncertain.

    Actually, for smallish documents where the equivalent of a notary signature is required, this may be the way to go.

    Sigh.

    In the meantime, expect courts to allow people to repudiate supposed signatures when the validity is in doubt, the same way banks allow individuals to swear under oath that a signature on a stolen or altered check is fake.

    In the meantime, I'll sign my employment contracts the old fashioned way and keep a copy in my files.
    • 1 reply beneath your current threshold.
  • This has a solution (Score:2, Informative)

    by yfarjoun (878821) on Friday June 10 2005, @04:21PM (#12784324)
    (http://math.berkeley.edu/~yfarjoun)
    This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".

    See http://en.wikipedia.org/wiki/Birthday_paradox [wikipedia.org] for details.

    Basically if you have enough documents you will be able to find two with the same hash. Not to be mistaken with the difficulty of finding a document with the same hash as a GIVEN document. This cannot be avoided regardless of the hash.

    To foil a birthday attack, NEVER agree to sign a given document. One must ALWAYS add some random junk to the document (clear text or hidden). This should be implemented in sofetware that helps in signing, but if the option isn't given, you can add a clear text "random" string.

    Yossi

  • by Gollum2001 (514666) on Friday June 10 2005, @04:23PM (#12784347)
    but almost. MD5 it's obsolete now and SHA-1 have been phased out. While things like this demonstrate some "probable but yet to be seen" attacks and in theory are good to see the weak spot of this kind of hash functions, the fact is that by now every serious cryto soft must be using high order sha, like SHA-256 or SHA-512 or stronger has functions like Tiger or Whirpool. There is no real threat. Some people said here in /. that Schneier wants a new competion, well, that's good, but not really necessary as SHA was designed with this kind of problems in mind (every hash funcion has collisions, you can't avoid that) that's why there are high order SHA funtions.
  • by linuxhansl (764171) on Friday June 10 2005, @04:33PM (#12784464)
    Every hashsum necessarily has collisions. There are two reasons why cryptographic hashes work anyway: 1. The value distribution (a small change in the input leads to a completely different hashsum). 2. The large set of possible hashes. MD5 is 128 bits. That allows roughly for 3x10^38 values. That is huge number. SHA-256 is 256 bits which allows for roughly 10^77 hashes. That is only within orders of magnitude of the estimated number of atoms in the observable universe (between 4×10^78 and 6×10^79). So I think it is safe to say that nobody will ever see or be able to generate a collision with SHA-256 or higher (unless there is a flaw in the algorithm).
  • Risk is small (Score:2)

    by DarkOx (621550) on Friday June 10 2005, @05:28PM (#12785081)
    The only case I can think these techniques might be usable is if you wanted to change a decimal point to a comma or add a zero or something like that on a number. Changeing an entire text word in the document would be practically impossible. This might be somewhat useful for some forgeries but not many..
  • Signing such a non-plaintext file by looking at it in a viewer only (and (a bank) accepting such a signed document) is the mistake. You don't even need MD5 collisions to exploit that.

    Suppose you write a HTML file with some java-script embedded that displays one text if the date is before X, and another if it's after. you get someone to sign it before X, and then use it afterwards.

    Similar, in general, different viewers interprete (broken) files differently. So if you know which viewer the signer and the bank use, you could probably write a postscript file that displays different text in those viewers.

    No checksum or cryptographic mechanism is going to prevent that. Nor is displaying files differently than other viewers a security problem that should be fixed in every viewer.

    So the only way to prevent this, is not signing "binary" files (not fully plain text human parsable) written by someone else.
  • by griasr (822487) <privacyking@yahoo.de> on Friday June 10 2005, @06:25PM (#12785536)
    since the first news about hash-weaknesses i turned to use "multi-hashes". meaning, hashing the whole file first then just hash some more fixed defined small parts of it with a "weak hash" like md5. then i pulled out a "hash of all hashes" which wont be too easy to reproduce or fake.
  • I must say it is a remarkable piece of work, creating two different documents that make sense after rendering. In case of these documents, some evidence of tampering might be easy to find upon closer examination of the files, but its still a very important thing.

    This also makes me wonder on how widespread the consequences may be. Using MD5 and similar digests is common practice, and was until recently considered quite safe.

    As the bittorrent protocol uses extensive hash checking, could this new discovery mean that MPAA and the likes will be able to insert corrupt data to torrent networks? I can imagine this would be a very real 'application' of this research, if it allows one to create blocks of data with hashes and sizes as desired.

    Even worse, in torrent networks, these poisoned blocks will probably be re-distributed by all peers - or am I missing something?
  • Digital Forensics (Score:1, Interesting)

    by Anonymous Coward on Friday June 10 2005, @07:34PM (#12785907)
    This kind of information will wreak havoc with digital forensic departments. Right now we use various checksums, the most common being MD5, to verify the integrity and accuracy of a drive image/duplicate. Even if the chances of an accidental collision are so remote, the fact that people are researching and reporting on the fact that it can be done means proving this is a duplicate in court will be harder. Best case scenario, jury is shown reports like this, you then have reasonable doubt in the jury that the drive may NOT be an exact copy. Worst case scenario, the defence claims your lab doctored the results, and then they "prove" it with these algorithms and reports.
  • Nonces And Blind Signatures (Score:2, Informative)

    by Anonymous Coward on Friday June 10 2005, @10:06PM (#12786708)
    I can't believe that nobody, including Bruce Schneier, has pointed out the well understood principle that you *never* blindly sign something provided to you by someone else.

    Secure schemes involving one party signing material provided by another party should always involve a signing nonce. This is a random bitstring appended to the material to be signed. In the example from the article, if Caesar had added a nonce before signing the "good" document, then Alice would not have been able to produce a "bad" document with a matching hash.

    However, as someone pointed out, this kind of attack is possible without using hash collisions. You simply have to convince your victim to sign something that is code-like (such as the postscript files used in the example, or an HTML document with javascript, or a Word doc with Visual Basic macros, or an executable) without making an expert examination of the complete contents of the code. The code-like "document" can be written to present one set of behavior under one set of circumstances and a completely different behavior under other circumstances. If the signer is not competent to examine the program and be fully aware of what it does, then the signature can have no meaning.

    The attack demonstrated in the article is *still* of no practical consequence in my opinion. It doesn't mean that signed code can't be trusted; trusted code is always signed by it's creator, meaning that if the signature is valid *and* you trust the creator, then you can trust the code. It is still impossible for someone who didn't create the signed code to create a different piece of code that has the same hash, so you don't have to worry about an attacker replacing signed trusted code with signed malicious code. As always, you still have to worry about misplaced trust in the code's creator.

    What the attack in the article *should* be trying to make clear is that a signature is only as trustworthy as the signer's ability and desire to evaluate what is being signed. A third party auditor that signs code after reviewing it doesn't ensure that the code is trustworthy, it only ensures that the auditor *claims* the code is trustworthy. What the implications are for the actual trustworthiness of the code depends on the trust you place in the intentions and abilities of the auditor.
  • Re:huh? (Score:2, Funny)

    by Captain BooBoo (614996) <(moc.liamtoh) (ta) (sretupmoclled)> on Friday June 10 2005, @02:52PM (#12783242)
    Are you kidding? ANY article with the word "hash" in it is going to grab the attention of even the weakest synapse. ôô
    [ Parent ]
  • do you know how big 2^128 is? (Score:5, Insightful)

    by frovingslosh (582462) on Friday June 10 2005, @02:52PM (#12783244)
    Collisions are NOT and accidential everyday occurance. What is being discussed here is a deliberate md5 match, created by making just the right changes to a document to intentionally get an md5 match.

    2^128 is huge. It's larger by far than the number of all the files in all of the computers in the world. It larger than the number of stars in the universe. Chance collisions will not become an everyday occurance. No accidental collision has ever been found yet. Switching to larger keys will not change anything. Sure, they might make it slightly harder to make a deliberate collision (although I don't know for a fact that they make it harder at all, there were some reports of someone in Japan being able to create a collision by hand with only pencil and paper), but just wait 2 months and the computing power will catch up with that. It's not a matter of the size of the hash function.

    [ Parent ]
  • Re:huh? (Score:2)

    by rbarreira (836272) on Friday June 10 2005, @02:57PM (#12783309)
    (http://wod.home.dyndns.org/)
    Care to explain why?
    [ Parent ]
  • Re:huh? (Score:1)

    by Compact Dick (518888) on Friday June 10 2005, @02:58PM (#12783322)
    (http://www.thundersplace.com/)
    I find it hard to believe that even Slashdot readers find this interesting.

    This latest development should not only concern us nerds, but the rest of us as well. You see, this time they demonstrated that two messages can have the same hash, and neither of them need look like gibberish.

    In other words, one might read "Bow before me and my enormous todger!" while the forgery could be "Warning: you need an electron microscope to view my penis". How can one tell which is real and which is fake?
    [ Parent ]
    • Re:huh? by egypt_jimbob (Score:1) Friday June 10 2005, @03:38PM
    • Re:huh? by pyrrhonist (Score:2) Friday June 10 2005, @03:40PM
  • Re:First post (Score:2)

    by rbarreira (836272) on Friday June 10 2005, @03:03PM (#12783373)
    (http://wod.home.dyndns.org/)
    Well, you're of course a troll, but for people who might not be very informed about this... That unlikely event that you talk about has just been purposely produced. And they could have used any different collision they wanted for this effect, since if you append the same content (ANY content will do) to two files which are an MD5 collision, you end up with... two files which are an MD5 collision.
    [ Parent ]
    • Re:First post by xintegerx (Score:1) Friday June 17 2005, @07:41AM
      • Re:First post by rbarreira (Score:2) Friday June 17 2005, @09:57AM
  • Re:Security Through Obscurity (Score:4, Informative)

    by Dachannien (617929) on Friday June 10 2005, @03:05PM (#12783403)
    (http://www.unity08.com/)
    We're talking about cryptographic hashes here, not encryption. Encryption is meant to be a reversible process, and is therefore one-to-one. In other words, there's no concern over collisions with encryption.

    With cryptographic hashes, you're throwing away nearly all of the data to obtain a hash (a number) which represents the larger data set in such a way that (hopefully) the hash will never turn up again in practical usage. The article here indicates that there are ways being devised to force two data sets to have a hash collision while keeping the practical parts of the data sets the same.

    As for accusing encryption of being "security through obscurity", you're misusing that term. If knowing the encryption algorithm allowed you instant access to all data encrypted with that algorithm, then yes, the only security present would be dependent upon the secrecy of the algorithm itself. But that's not the case here. Encryption typically works by public key exchange, meaning that a key (a number) used to encrypt messages is shared with the encrypting partner, while the key to decrypt and recover the data is kept private (is never transmitted). Recovering the private key through brute force is not a compromise of the algorithm itself - given enough time, any private key can be recovered, regardless of the algorithm, but by increasing the key size arbitrarily, the time taken to find that key can also be increased arbitrarily.

    [ Parent ]
  • Re:Security Through Obscurity (Score:2, Informative)

    by MyTwoCentsWorth (593731) on Friday June 10 2005, @03:09PM (#12783450)
    Dear me,

    Your ignorance is so appalling that I have a hard time deciding where to start.

    First - this is about digital signatures, not about encryption - sharing the documents (by sending them over electronic media or otherwise) is needed.

    Second - obfuscated algorithms ??? Open any book on encryption to learn that the strongest encryption algorhitms rely on completely documented algorhitms - security through obscurity is so 80's...

    My suggestion - try to have something worth saying before making suggestions.

    Happy Posting.
    [ Parent ]
  • The GUID (as called by Microsoft) has it's history in the UUID, which was first "invented" by OSF as part of it's big DCE specs (long before Microsoft adopted the format and gave it a spiffy new name).

    The UUID and GUID look alike, but the UUID was constructed in a much different manner than GUIDs seem to be nowdays. GUIDs are basically 128-bits of random data (usually made by passing pseudo-random seeds through a hash). UUIDs on the other hand contained structured data too in addition to randomness, to try to prevent accidental reuse. For instance the rightmost 48 bits come from the Ethernet MAC address if available. Other bits contained date/time information, process/thread ids, etc.

    There was an expired Draft RFC [ietf.org] written in February 1998 which explained a lot of the technical details of GUIDs and UUIDs (surprisingly authored by Microsoft). You may still be able to find copies of it out on the net (the filename was "draft-leach-uuids-guids-01.txt")

    [ Parent ]
  • Re:big oops (Score:2, Informative)

    by tota (139982) on Friday June 10 2005, @03:22PM (#12783623)
    (http://nagafix.co.uk/)
    sha-256 and later have not been found to be weak yet. It does not mean that they are unbreakable (or I should say: possible to derive collisions) but it is definitely better.

    Another solution that should work quite well is to combine hashes: (md5+sha1) is definitely much stronger as you would need to find a collision that works for both algorithms at the same time.
    Possible, but not likely.
    [ Parent ]
  • Re:Security Through Obscurity (Score:3, Informative)

    by loose_cannon_gamer (857933) on Friday June 10 2005, @03:23PM (#12783632)

    I think that's a big shortsighted... I agree that if we let history take a crack at it, that any encryption put together by smart people will eventually be breakable by smart people.

    However, most data that I deal with day-to-day is time relevant. Do I care if someone figures out my credit card number on an account I closed 5 years ago? Is it terrible if someone hacks an old email only to find out I was begging a professor for a passing grade in 1997?

    Encryption is meant to hide things, and for many things, the need to hide is temporary. If the hidden thing stays hidden as long as it needs to stay hidden, there is nothing wrong with it.

    Know the limitations on the technology you use, and know the parties with which you exchange information. Those two rules alone, if followed, will probably provide more than adequate real-world defense. Perfect? No. Good enough. Statistically, yes.

    [ Parent ]
  • by bturnip (761620) on Friday June 10 2005, @03:25PM (#12783658)
    And if this was a time before electronic media, would the suggestion be "don't write it down on physical media"? ;)
    [ Parent ]
  • Re:big oops (Score:1)

    by kabloom (755503) on Friday June 10 2005, @04:23PM (#12784351)
    (http://www.iit.edu/~kbloom1/)
    Be careful with MD5, don't worry about SHA1 yet.

    This isn't so much a flaw with the hashes as it is a way of exploiting certain kinds of documents to take advantage of a hash collision.

    Plain text messages, HTML documents, and the like don't appear to be exploitable with this type of attack. But executable files and postscript files can be exploited -- to a certain extent.

    You can't exploit arbitrary files -- the file has to be specially constructed to allow for the exploit. The problem isn't with using MD5sums to verify the integrity of programs on your machine. The problem here is digitally signing untrusted data -- it only raises the bar for what you need to check before you sign data. In the article, what Caesar should have done is indicated that he'd type up the letter for Alice and send her a signed version by the end of the day. (Although I'm not sure how to handle the social niceties involved)
    [ Parent ]
  • by figge (558151) on Friday June 10 2005, @04:41PM (#12784532)
    % md5sum ./md5*
    d80ae934acc0231a0fcb99716346c8e6 ./md5collision
    d80ae934acc0231a0fcb99716346c8e6 ./md5other
    % ./md5collision
    This was a dumb, forced, collision that would never happen in real live.
    % ./md5other
    This was a dumb, forced, collision that would never happen in real live.

    Same hash, same output. What's your point? ;-)

    (Hint: strcmp(argv[0], "md5collision") != 0. You might also have some trouble with argv[0] being "/home/ch0p/md5collision", "./md5collision" etc, but hey, your point got across. I'm just nitpicking :-) )

    [ Parent ]
    • 1 reply beneath your current threshold.
  • by julesh (229690) on Saturday June 11 2005, @04:06PM (#12790710)
    MD5 seemed like a good way to check for duplicate jpg images, sometimes an image with the same data had a different name.
    But equivelent MD5 hashes of different images made a mess of that scheme.


    If you have two different JPEG files that have the same MD5 hash but show a different image, I suspect security researchers would like to see them. Especially if they're the kind of JPEG images most slashdotters seem to have more of than the other kinds...
    [ Parent ]
  • 12 replies beneath your current threshold.