Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

SHA-1 Collisions for Meaningful Messages

Posted by CmdrTaco on Sun Aug 27, 2006 08:44 AM
from the well-this-isn't-very-helpful dept.
mrogers writes "Following on the heels of last year's collision search attack against SHA-1, researchers at the Crypto 2006 conference have announced a new attack that allows the attacker to choose part of the colliding messages. "Using the new method, it is possible, for example, to produce two HTML documents with a long nonsense part after the closing </html> tag, which, despite slight differences in the HTML part, thanks to the adapted appendage have the same hash value." A similar attack against MD5 was announced last year."
+ -
story

Related Stories

[+] Meaningful MD5 Collisions 312 comments
mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by macadamia_harold (947445) on Sunday August 27 2006, @08:52AM (#15989271) Homepage
    Using the new method, it is possible, for example, to produce two HTML documents with a long nonsense part

    To achieve this, the method uses material pulled from myspace.com.
    • Re: (Score:2, Funny)

      by Anonymous Coward
      To achieve this, the method uses material pulled from myspace.com.

      Falling back to sco.com, Microsoft.com, digg and slashdot, in that order.

  • It's a hash algorithm... no big deal ? Just like it was proven possible to break MD5 for binary executables and others them (both) won't be dropped for, say, storing passwords on a database.
    • by rg3 (858575) on Sunday August 27 2006, @09:20AM (#15989345) Homepage
      I'm not an expert so don't take me seriously about it, but: I think hash algorithms are very important when signing things. For example, SHA-1 is the default hash algorithm used in GPG to sign messages. When the first attacks were mentioned I changed that to use RIPEMD-160. If you download something that has a SHA-1 sum to check both correctness and autenticity, it's a problem. Someone could modify the original tarball, for example, introducing a trojan horse, and append some other not useful data to it so the sum matches. I don't know if that's technically feasible as I say, but I imagine the possibilities are not so far. And, furthermore, for me this is another important warning that we should move out of SHA-1 ASAP. BTW, if I recall correctly BitTorrent uses SHA-1 to verify the 256KB chunks. There, having a fixed-size chunk is an advantage for this case, but, as I said, I wouldn't trust SHA-1 much longer. A further step and people could build evil BitTorrent clients that, at least, could corrupt your downloads introducing noise chunks.
      • right now the same person has to create the good and bad original files then get people to sign the good one before he distributes the bad one.

        and upon close inspection the good one would look very suspicious (lots of random garbage)

        so its a break of sorts but nowhere near as bad as a preimage attack would be.

      • Re: (Score:3, Interesting)

        Someone could modify the original tarball, for example, introducing a trojan horse, and append some other not useful data to it so the sum matches.

        That's the neture of hash keys anyway. The security weakness is not into collisions possibly existing, but how fast and feasible it is to find them.

        It's a simple logical rule: if you have a 256-bit key, this is 2^256 possible hash combinations. If you put in a folder all possible 257 byte long text files, then each file will have a key that matches a the key of a
        • The SHA and MD5 attacks are bait-and-switch, so the attacker must control both messages. The current research doesn't imply any weaknesses for HMAC applications, such as password storage. So yes, SHA and MD5 are probably still quite acceptable for these purposes if they're already in place.
        • The problem is that your old keys and the messages they encrypted are available for cracking now and forever. Most people only encrypt important messages, which are easy to look for in a mailbox, and at a later time could be easy to crack. There's probably even a good change the data in that mail could still be important.

          Now, if all emails were encrypted, it would be harder to immediately see what messages in a mailbox deserve your attention. But then at a later date CPU speed may make that a negligible difference.
  • This is a big deal (Score:5, Insightful)

    by gweihir (88907) on Sunday August 27 2006, @09:12AM (#15989331)
    One thing is that cryptographic hash functions should be easier to make secure than ciphers. At leaste that is what many cryptogtaphers thought. The other is that up to now you could rely on SHA-1 to be collision resistant, no matter what. The argument that you have a large part of the message being "garbage" does not give any real security. Many, many applications can still be attacked, and they need not even be broken for that.

    While expected since last year, selecting and using crypto-hashes just got a lot more difficult and error prone.
    • by packetmon (977047) on Sunday August 27 2006, @09:26AM (#15989372) Homepage
      Even if their test hardware could be accelerated from 33 MHz to 4 GHz, the process would still take 170,000 years. And even if a giant cluster of such machines were used, no collisions would be found within a realistic timeframe of a few years.


      The second reason to keep cool is just as important, if not even more so: hackers will have to execute a pre-image attack to manipulate, for instance, a contract that has been digitally signed. In other words, hackers will have to find a second, manipulated contract with the same hash value as the real contract. In principle, the number of operations needed is thus far greater (2160). Indeed, as far as we know all attacks to date have only concerned collisions, and Wang et al. does not change that. There are no known methods to reduce significantly the number of operations needed for pre-image attacks.

      Don't you think you're flying off the meter here a bit... Just because a collision was found means truly little. So a garbage laced HTML page was created after the actual HTML closing tag... 1) No one will see what comes after that unless you like viewing the source of a webpage as opposed to an actual page. 2) You should read up on birthday paradoxes. If someone created two similar messages, it would take years for them to figure out how to compute a hash to match. Now in the field of sending out something so so so secure, what makes you think that even if a someone did re-computate a hash to match, that message would be worth anything years down the line. Someone would have to be able to accomplish a collision, re-computate the hash in their new message and send it all within minutes for it to truly be a threat.

      Let's look at this scenario... A massive kernel update is made to say Linux... The information is hashed, posted, and everyone is now going to update their Linux boxes... Unless someone is so quick fast to intercept along this path, most are safe unless they choose to verify the hash years down the line (which by then would be worthless). So unless someone can exploit this within minutes (no more than I would guesstimate 36 hours), I see little reason to get all bent out of shape over this...

      • by hankwang (413283) * on Sunday August 27 2006, @09:42AM (#15989422) Homepage
        1) No one will see what comes after that unless you like viewing the source of a webpage as opposed to an actual page.

        Common web browsers (I just tried Opera, FF, and Lynx) will happily display everything after the closing tag. You would have to put it inside <!-- --> comment delimiters, but then it doesn't matter whether it is before or after the closing tag. Unless the attack doesn't work if the --> has to come at the end, but then you can just omit the closing tag. Only an XHTML-compliant browser would complain. From cursory scanning TFA it is not clear to me what the reason is for mentioning the closing </html.

        • Not true -- XHTML-compliant browsers will only treat it as XHTML if it's sent with a "Content-Type: application/xhtml+xml" header. This is very nice for keeping your page clean -- view it in Firefox with that header, and it'll parse it as XML, and complain whenever you have a problem with your XML, saving you a trip to the even more pedantic W3C validator. Unfortunately, sending that header will have browsers that don't understand XHTML attempt to download the page, rather than displaying it as HTML. Eve
    • Re: (Score:3, Insightful)

      Whirlpool [terra.com.br] is a good choice these days. It's longer than most of the hashes out there, but I don't believe there have been any attacks yet demonstrated against it.

      For those pythoners out there I wrote a quick wrapper for it [kepty.com] that should get you started. Excuse any site errors and just hit refresh

      • by FooBarWidget (556006) on Sunday August 27 2006, @09:44AM (#15989430)
        For anyone wanting to use Whirlpool in their apps, here are libraries that you can use:
        • Whirlpool library for Ruby. [plan99.net] This is written by me, based on the sample C implementation by the inventors.
        • The above library can also be used in C apps. Just copy whirlpool-*.[ch] to your project. See whirlpool-algorithm.h for API.
        • The GNU Crypto [gnu.org] library for Java contains a Whirlpool implementation.
          • What I did is writing my own Whirlpool Perl module, which is included in the main program. The XS file is here [sourceforge.net]. It depends on whirlpool-*.[ch] as available in my Ruby Whirlpool library.
            • Interesting, thanks for the link. AFAIK, there's nothing wrong with the one on CPAN, though. It's just that you can't do an "apt-get install perl-whirlpool," or the equivalent on other distros.
      • by Ckwop (707653) * <Simon.Johnson@gmail.com> on Sunday August 27 2006, @10:02AM (#15989507) Homepage
        Whirlpool is a good choice these days. It's longer than most of the hashes out there, but I don't believe there have been any attacks yet demonstrated against it. For those pythoners out there I wrote a quick wrapper for it that should get you started. Excuse any site errors and just hit refresh

        Seconded. Whirpool uses similiar mathematics to AES so an attack that breaks Whirpool is likely (although not certain by any stretch of the imagination) to also break AES.

        I think much like it is harder to design a cipher that resists attack when you use an LFSR as your base primitive it is hard to design a hash that is secure that uses an Unbalanced Fiestel Network (UFN).

        This is why I do not advocate moving to the higher SHAs. I believe that some weakness will be discovered and it will be found the UFN made it worse.

        If you're going to use AES, you've already thrown all your eggs in the Wide-trail design basket. If you're going to do that for the cipher, you might aswell do the same for the hash too.

        In fact, in most cases you will use the hash has part of an authentication primitive anyway. In this case, there's a good argument for dumping a new hash and using an encrypt-authenticate mode of operation instead of something like HMAC. That way, you reduce the number of assumptions which have to be true for the system to be secure, which can only be a good thing.

        In short, if you need to authenticate use your favourite encrypt-authenticate mode. If you need a hash for some other purpose, use Whirlpool.

        Simon

        • In fact, in most cases you will use the hash has part of an authentication primitive anyway. In this case, there's a good argument for dumping a new hash and using an encrypt-authenticate mode of operation instead of something like HMAC. That way, you reduce the number of assumptions which have to be true for the system to be secure, which can only be a good thing.

          Alright, so there's this SSH packet coming down the line. For those of you not in the know, it loos like this:
          [encrypted block][hmac of decrypted
      • by jd (1658) <imipak@yah[ ]com ['oo.' in gap]> on Sunday August 27 2006, @01:35PM (#15990376) Homepage Journal
        Whirlpool is also in mhash, which is now many versions on from the ones supplied by Fedora Core and Gentoo. Oh well. It's also in the Linux kernel's crypto library. Whirlpool is a damn good hash and uses the same principles as the Rijndael cipher, which means that the underlying concepts have been deeply analyzed twice - once in each form - showing the basic ideas are fairly solid. Being long should reduce the risk of collisions so is actually a strength in many cases - particularly as we're talking bytes, not megabytes.


        Tiger is another good hash function - faster than Whirlpool, smaller for those embedded cases where even the bytes matter, and I believe it is not known to have any attacks against it. Tiger also appears in mhash, not certain if it's in the kernel but it should be.


        I don't see that it is really of any consequence whether anyone has actually demonstrated an attack on SHA - the point of security is to not wait until AFTER the house has been plundered to upgrade. SHA is FIPS-180, but if there is even a theory on how it might become broken, I would urge people to use something stronger. Security that is only certain to be good against skript kiddies is really not very useful security.

    • by bwcbwc (601780) on Sunday August 27 2006, @05:29PM (#15991143)
      Actually, hashes are difficult to secure for general communications purposes without putting a cap on the size of the transmission. In information-content terms, a collision proof hash is equivalent to a lossless compression algorithm.

      A hash will either contain all of the non-redundant information in the original content, or some of the information gets lost during the hash. Non-redundant information being defined in an information-theory sense that a given bit is completely random/unpredictable based on the content of preceding bits.

      In order for a hash to be completely collision proof, it has to contain all of the non-redundant information contained in the original file. Otherwise information in the orignal message is lost in the hash. And if information is lost from the original message, that creates a possibility of constructing a message that differs only in the information that is removed by the hash. Only if the original message is reconstructible from the hash (plus possible information contained in the hash algorithm itself) will it be collision-proof. You've either got the information-content, or you don't. And if you don't have the content, you can't validate it.
  • by Name Anonymous (850635) on Sunday August 27 2006, @09:46AM (#15989436)
    Provide the following 3 pieces of data:

    1) Message/file length
    2) SHA1 hash
    3) MD5 checksum or some other hash/checksum that's calculated way differently from SHA1.

    Providing the length means that the person trying to change the data needs to keep it the same length which makes it more difficult.

    Using 2 different hashing/checksumming methods means they have to be able to match both of them in order to be able to switch the data.

    The more restrictrictsion we toss on the data, the harder it is to manipulate. I do think that using more than 2 or 3 hashing/checksumming methods would be overkill however.
      • Re: (Score:2, Insightful)

        In cases of verification (rather than security) isn't more specificity better? I'd agree that double-hashing something like a secret password causes a loss of security, but if you're double-hashing a file to verify its contents, more specificity means it's harder to get a match by garbage-packing.

        I really am asking-- I'm not all that up on the guts-and-wherefores of encryption/hashing, and I've wondered about this question as well.
  • by SiliconEntity (448450) on Sunday August 27 2006, @10:51AM (#15989711)
    NO SHA-1 COLLISIONS HAVE EVER BEEN FOUND!

    Ahem.

    Sorry, my caps lock key got stuck there.

    No SHA-1 collisions have ever been found. The first person or group to find one will achieve considerable fame. I say this as an attendee of both last week's Crypto conference and the immediately following hash function workshop.

    The work factor estimated for a SHA-1 collision is something over 2^60 cycles. That would put it on par with the biggest calculations that have ever been done (publicly anyway). So far nobody has put together a sufficient effort to achieve a collision.

    At the hash function workshop, cryptographer Antoine Joux published a set of recommendations for how such a hash collision effort should be mounted, in order to minimize the damaged from a published collision. The main goal is to make it difficult to take a published collision and use it to create harmful effects in various ways. Hopefully Joux's guidelines will be followed if and when a SHA-1 collision finding project gets started.
    • easy tiger... (Score:3, Insightful)

      by Anonymous Coward
      I think the key point is this:

      No SHA1 collisions have ever been published

      whether or not they have been found is a different matter entirely.
  • by KalvinB (205500) on Sunday August 27 2006, @11:10AM (#15989790) Homepage
    If the MD5 of the two different strings that had the same SHA-1 value are different then there's no real reason to panic. For an added level of security you could also calculate the byte length of the data.

    Software will just need to be updated to calculate two hashes. Good luck finding two sets of data that are different yet have the same length, the same SHA-1 hash and the same MD5 hash.
  • Insecure (Score:5, Informative)

    by zlogic (892404) on Sunday August 27 2006, @11:43AM (#15989942) Homepage
    SHA-1 was proved to have insecurities years ago. Because of that SHA-2 ("SHA-256", "SHA-384", and "SHA-512") was released back in 2001 as a better version of SHA-1. SHA-2 was tested and no insecurities were found (yet). What's more, SHA-2 is now the official US standard.
    Complaining that SHA-1 is insecure is like complaining that Windows 98 is insecure.
    Oblig Wikipedia link: http://en.wikipedia.org/wiki/SHA_hash_functions [wikipedia.org]
    • I just want to put a plug out for my good friend whirlpool [wikipedia.org]
      Slected as the replacement for SHA-1 years ago by the European NSA, 512 bits long, based on AES.
      Who loves ya?
      • md5sum /boot/vmlinuz
        d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz


        Haha, I have cracked your worthless hash and I can tell you that the sha1sum of your very short kernel also begins with a 'd', see:

        sha1sum /boot/vmlinuz
        da39a3ee5e6b4b0d3255bfef95601890afd80709 /boot/vmlinuz

  • It looks like validation of document structure significantly complicates the problem of creating a hash collision (the technique mentioned relied on padding the document with junk at the end, which, IIRC, is not in line with modern XHTML specs). Of course it's best to use a secure hash to begin with, but this does add yet another layer of resistence.
  • When messages are signed they are already compressed and the
    file size is used as part of the final message digest.

    If your application doesn't do this then you should use another
    applications.

    Using the above mentioned processes none of these attacks are
    viable, both in the short and long terms. (supposedly)

    When receiving the following message:

    Jon D0e owe$ Jane Do3 $1000`000 d011ar5.

    or

    Jon Doe owes Jane Doe $1000000 do11ars.
    $4#^$%^&*5#$^()^%ER$%^RT#$3

    Would you not ask the sender to resend
    their original messa
  • If you can manage to replace a hashed file from a web site with content of your choosing, couldn't you just as well change the hash it's checked against?

    Even if the hash and the file came from seperate sources, odds are a user went to 'xyzzysoft.com' from which the link to both the hash and the file are provided; so if xyzzy's page were hacked, they are screwed anyway.

    To use a car analogy, to steal a purse in a car, a theif dosn't pick the lock, he smashes the window.

    Unless you have a seperate, secure chann
    • Re: (Score:2, Insightful)

      Where's the "Correct filesize" kept? If it's stored in the document, it's still possible (Though more difficult) to change it and make a collision.
      • Re: (Score:2, Interesting)

        What if it were tacked onto the end of the hash? So your hash would look like

        071883abcdef18234760abdefcd38f387bc93-78b

        And 78b would be the size of the file. Not being a cryptologist, not even remotely, I am probably missing something quite simple and obvious here, but at first glance it seems like this would work.
        • Re: (Score:2, Interesting)

          That would work for typical everyday use (like a checksum next to a link to a downloadable file). Of course, this is assuming that the birthday attack wasn't keeping a uniform file size.

          It would also take a bit more (though maybe not much) to apply it to digital signatures.
        • by edmudama (155475) on Sunday August 27 2006, @11:09AM (#15989787)

          From the article:

          Using the new method, it is possible, for example, to produce two HTML documents with a long nonsense part after the closing </html>tag, which, despite slight differences in the HTML part, thanks to the adapted appendage have the same hash value.

          So it appears that both the original and the new messages need that appendage. This isn't just about adding an appendage to a known, appendage-less document.
    • Re: (Score:3, Insightful)

      I have to say, trusting SHA-1 to do what it says on the tin, is not incompetent. Naive, sure, but not incompetent.
      • So SHA-1 is broken. In other news, sky found to be blue.

        Come on guys, SHA-1 was broken years ago. That people are now devising attacks is hardly newsworthy.
      • Re: (Score:3, Informative)

        I have to say, trusting SHA-1 to do what it says on the tin, is not incompetent. Naive, sure, but not incompetent.

        Let's not forget this attack is against a reduced 64 round version of SHA-1.
        SHA-1 still does what it says on the tin: the best attack known against the 80-round version is 2^63, which is still not practical.
    • by wfberg (24378) on Sunday August 27 2006, @10:34AM (#15989626)
      I'm not quite sure what your comment means.
      As the heise.de article points out; the twins are of equal length - the file size would be the same.
      Finding hash twins whereby the chosen one is, oh, let's say 160 bits longer is a degree less sophisticated.
    • I would have thought this not such a big issue for software developers who aren't incompetent.

      I don't know the details of this particular attack, but usually attacks on hashes like this produce two documents with the same file size. Certainly the MD5 collisions a couple years ago had the same file size.
      • Yes, but the summary states that it is a similar HTML doc with lots of nonsense put at the end so that the hash matches. I very much doubt in this case that the files are the same. Plus, if you know something about the document, lit the size, or the fact that the file should be XHTML 1.0 compliant, without any trailing garbage that doesn't make any sense, then it makes the hash a whole lot harder to crack. The thing is, if you rely 100% on the hash, then you could get screwed, however, if you verify the
    • It's quite relevant for those using it as a way to verify executables are the way developer had intended. Like the attack last year they're saying it would be possible for someone craft an exe without a virus, generated a checksum for it, get it linked to from major websites (after passing a virus scan of course), and then drop a virus in the end of the file and not have the checksum change. That's the real-world relevance.
    • Re: (Score:3, Informative)

      And actually reading the hash function descriptions is too complicated a solution for you? Oh, I forgot, you want your karma and most mods will assume you know what you're talking about because they don't read the specs either.

      MD5, SHA, and every other hash function I've ever read the spec for appends some zeros followed by the original message size (the zeros are so it comes to an integer number of blocks) as the first thing it does. For exactly this reason.

      At a guess, this attack requires that the t

      • As I understand it, the RIPEMD family pad only the final block with zeroes to end the message on a block boundary, and this is where the "extra bytes" are added. I agree with the GP on this, since it's much harder to collide a hash if that hash includes the raw message length as a component - then the final padding becomes a known quantity. (For example, such an algorithm, if comparing a message to a hash, could pre-pad the final block with a known quantity of zeroes before checking the real message lengt
        • Re: (Score:3, Informative)

          I'm pretty sure you're confused here. First, note that the padding is a 1 followed by some number of zeros, so no matter what the message actually ends with it is clear what is padding and what is message. That is, in all circumstances adding zeros to the end of the message changes what is being hashed, even when you ignore the file size part. And secondly, the message size as appended onto the message is the *original* message size, *before* any padding occurred. So appending zeros not only changes wha
    • You can never quite figure it out because you are completely clueless about crypto. Hope that helps.

      PS: The attacks on both algorithms create files of equal size.
    • Re:git (Score:4, Interesting)

      by kasperd (592156) on Sunday August 27 2006, @04:05PM (#15990929) Homepage Journal
      This is bad news for the git.
      It is not a major problem. First of all to exploit it, you'd have to generate a collision and have one of the two versions accepted in mainstream. Second you'd have to get the wrong version onto some user's machine before the correct version. Linus explained this in a posting somewhere after the original SHA-1 weakness was published. And though Linus AFAIK does not have any education in cryptography, he has demonstrated, that he clearly knows how to apply cryptographic primitives in a sound way. I have a PhD in cryptography, and I have read about the design of git, and I did not spot any weakness.

      For now though from a theoretical viewpoint this is a major weakness, it still requires way too much processing power to be realistic. And the way git is designed, I don't think it is going to be any major problem switching to a new hash once cryptographers starts to agree which one should be considered secure in the future. Once they start using a new hash, you can actually still safely use old repositories based on SHA-1. Because once there is no longer being added new data based on SHA-1, a collision is no longer enough to perform an attack, rather you need a second preimage, something which there has not yet been demonstrated an efficient way to produce.