MD5 Proven Ineffective for App Signatures 117
prostoalex writes "Marc Stevens, Arjen K. Lenstra, and Benne de Weger have released their paper 'Vulnerability of software integrity and code signing applications to chosen-prefix collisions for MD5'. It describes a reproducible attack on MD5 algorithms to fake software signatures. Researchers start off with two simplistic Windows applications — HelloWorld.exe and GoodbyeWorld.exe, and apply a known prefix attack that makes md5() signatures for both of the applications identical. Researchers point out: 'For abusing a chosen-prefix collision on a software integrity protection or a code signing scheme, the attacker should be able to manipulate the files before they are being hashed and/or signed. This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process.'"
Hmm... (Score:1)
Perhaps they should consider using something else than SHA-1? SHA-2 anyone?
Well, duh! (Score:5, Insightful)
The problem has nothing to do with salt, and can be certainly temporarily "fixed" switching to SHA-1 or, even better, SHA-2. But the real root of the problem here is that, for the attack to work, someone signed as trusted a binary file that contained malicious code in the first place, even if in a disable form.
Let me explain that. First, this is very old news: we know since 2004 [wikipedia.org] that collision can be found in MD5 hashes (two different files with the same md5sum), and there now are tools that can generate collisions in seconds. All you need is a common prefix and suffix for both files and two block of 128 bytes that are generated automatically and you can insert between the prefix and the suffix to create the two files.
Applying this to pretty much any file type that can contain binary data (even XML 1.1!) is trivial. For an executable file you can simply insert code in your prefix/suffix that looks at the pseudo-random 128 bytes and does radically different things depending on it. This as already been demonstrated for HTML+JS and even for postscript files.
Bottom line: if you have an executable file from an untrusted source it may contain bad things (the attack described requires that both the original signed file and the file that you are actually executing are generated by the same hostile source).
Re: (Score:2)
Re: (Score:2, Insightful)
Its inevitable.
Re: (Score:2)
I commend NIST for having a SHA-3 crypto contest. I'm pretty sur
Re: (Score:1)
MD5 is what, 64 bytes? Your file will probably be less than a terabyte (2^40 bits), so it'll have a representation of 40 bits, or 5 bytes. So now you have a 71 byte hash. Are you going to tell me that 71 bytes is really "a lot harder" than 64? You're delusional.
The problem is that data is intrinsically incompressible. If you map the space of all finite sized binary words into the space of all binary words of n bits, you have an infinite-to-one map. If you want co
Re: (Score:2)
You can pad it out without too much trouble.
Whats tougher is making dual collisions. Say MD5 and SHA1.
ONE block, surely (Score:3, Interesting)
Re:ONE block, surely (Score:5, Informative)
Re: (Score:2)
Anyway, use one of the SHA algorithms
Re: (Score:2, Insightful)
1) Generate an MD5 hash for a file. .. more as needed ...
2) Generate an SHA-2 hash.
3)
4) Concatenate the results for a "super hash"
5) Profit?
Surely to manipulate 2 (or more) schemes to ensure the super hash is the same on a tampered file would be _many_ orders of magnitude harder?
Trying to make the SHA-2 match would destroy all the previous work done to make the MD5 match, then fixing the MD5 would change the
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
I don't understand why a gnupg digital signature is not used by default.
Re: (Score:1)
Just plain wrong. Read the article.
Re: (Score:2)
In other words, it's as easy as accidentally downloading it from the wrong web site. The files will still look right. The malicious web site would have obtained the executable from the originating download site and modified it according. Exceedingly simple and nefarious to attack the unsuspecting.
Re: (Score:2)
No, it would have to *modify* the original file on the original server to make this work.
Re: (Score:2)
It says: Now, we can publish good.bin in the Internet for people to download it, and later, we can replace it with evil.bin. Now, the users will get infected, without noticing and convinced that there is no tampering, because the MD5 signature is the same for both files, in others words we have MD5(goo
Re: (Score:2)
OK, that sounds fun. But at least for open source code, the md5checksum is on a compressed archive, not a raw executab
Nothing new (Score:5, Insightful)
This of course doesn't mean we should continue to use MD5, but the attack is really of rather theoretical nature.
Re: (Score:2, Informative)
14 Sep 2005
Re: (Score:2, Redundant)
Re:Nothing new (Score:5, Interesting)
Re: (Score:2, Informative)
Because you want an honest party to verify the "good" one, sign its MD5 with their trusted key, and actually distribute the good one.
Then you can in chosen circumstances replace it with the bad one (on, say, specific installs), and an ordinary audit will see the trusted signature on the package you thoughtfully provided on DVD.
Or think contracts: any signed-MD5 signature for a document in a format that ordinarily includes random-looking garbage is now untrustworthy, because what that person signed may hav
Re: (Score:3, Interesting)
For a pracical example:
1. Become a kernel contributor on some obscure driver.
2. Add a magic number somewhere, which is the good twin.
3. Wait for this to flow upstream to Linus, then downstream to all the distros
Re: (Score:2)
Re: (Score:2)
1) Even without this 'MD5-exploit', if the distributor is malicious from onset, then it is easy to just distribute a driver with a hidden exploit and sign it off as valid. Exploits can be written such that it functions like it is supposed to most of the time but provides a backdoor.
2) If you say that injecting this hidden exploit instantly show up in a
Re:Voting machines (Score:2)
Think voting machines. So far that has been the most requested approach, a verified hash code from open source, that is verified on each machine...
Re:Nothing new (Score:5, Informative)
When you inspect these binaries at bit level, they contain only the "good" or the "bad" version, and some random data appended to it to make the MD5 hash of the files collide. This technique thus also works for file formats which don't have control statements such as "if" or "file starts at offset". See also: http://www.win.tue.nl/hashclash/Nostradamus/ [win.tue.nl], scroll down to: "Didn't Daum and Lucks do something like this in 2005?"
Marc Stevens already constructed these "chosen-prefix" collisions for X.509 Certificates, see the HashClash [win.tue.nl] project page. What's new in these results, is that it did not require massively distributed computing efforts, only one Playstation 3 and less than two days of computation. There is no paper available yet as to how he achieved this major optimization, but his MSc thesis [win.tue.nl] gives a clue: see "future work" at the end of section 7.4.
Re:Nothing new (Score:5, Informative)
X; if (X) then GOOD else EVIL
and
Y; if (X) then GOOD else EVIL
but the evil code would be in the signed good program, it would not be run.
The new attack is different: it is a method to generate blocks GX and EX for two random files such that the files GOOD+GX and EVIL+EX hash to the same checksum.
Re: (Score:1)
Re: (Score:2, Informative)
This attack means that you get to choose the two files, and the attack generates two blocks to append to the original files which mean they hash to the same value.
So the exploits before have been:
Re: (Score:2)
Re: (Score:3, Informative)
He *does* need access to good.exe. You can't generate a file that matches a given MD5, what you however can is generate two files that have the same MD5 and different content, both good.exe and evil.exe contain appended data to make the sums match. Its still a weakness, but a much less critical one then being able to generate a file for a given MD5.
Re: (Score:2)
As the article mentions, this could be really bad for third party signin
Re: (Score:2)
echo Hello World
return
42k k22 452n4
Re: (Score:1)
Well everyone's boned then (Score:4, Interesting)
An attack that requires insider access? Well colour me frightened!
Or don't. That's more accurate anyhow.
Re: (Score:2, Interesting)
This is a REMOTE attack, and reasonably potent (Score:4, Informative)
If you'd read the article, you'd see that one of the (prominent) possible attack scenarios listed is that of software distribution: distribute a good file, with the intent of replacing it later. For example, in debian, even with MD5 checksums on all your data, and tools reporting what's changed during the software update, this would still allow downloading infected files, without noticing.
It's a danger both from malicious distributors, and from hacked distribution sites.
Re:This is a REMOTE attack, and reasonably potent (Score:5, Funny)
Reading the article? THIS IS SLASHDOT!!!!!!1!
Re: (Score:2)
Re: (Score:2)
True it is interesting that you can make MD5 do these neat tricks. But as in insider your non-malicious garbage appended exec should likely fail to pass the review process. People will ask "why do you have this funny bit of assembly
Right (Score:1)
Let's see now, the attacker already has access to the machine and is probably the one creating or comparing the MD5. Is the problem really with MD5?
Not accurate, not new (Score:5, Insightful)
MD5 collision attacks aren't really new, although this is a powerful example. An equally meaningful example of a collision attack on the algorithm, in the form of two different PostScript files with the same MD5 hash [cits.rub.de], was provided at least two years ago (IIRC).
The key to understanding the limits of this demonstration's significance is to realize that a collision attack is quite different from a prefix attack. These researchers were able to create a pair of executables having the same hash value by specially constructing them as such; crafting a new executable to match a specific hash value corresponding to some other party's executable is vastly more difficult to achieve.
So while this demonstrates MD5 to be useless for uses where the purported signatory is to be included in our threat analysis -- as has already been demonstrated to us by other researchers -- the algorithm is still relatively safe if our only goal is to ensure that a given executable almost certainly came from a specific party (rather than showing that it is a specific executable from said party). In other words, one could conceivably use MD5 to verify that the Ubuntu packages on that FTP server were in fact produced by Canonical. So no, demonstration does not mark MD5 as completely useless for code signing; the most common applications of code signing are entirely unconcerned with collisions in the hash function.
In conclusion: the title is terribly misleading, or possibly just misinformed. Boo! Hiss!
Re: (Score:2)
Err, "prefix attack"? It's too early in the morning for me to be posting to Slashdot...
pretext attack, for the record.
Re: (Score:2)
http://www.linuxworld.com/cgi-bin/mailto/x_linux.cgi?pagetosend=/export/home/httpd/linuxworld/news/2007/111207-hash.html [linuxworld.com]
Re: (Score:1)
Thanks
Re: (Score:2)
(I'm no expert on this by any means, but I hope I can partially answer your question...)
When a file is encrypted through a strong algorithm and there are private keys between the parts, it is still necessary to add something like MD5 to preserve integrity?
I'm not sure I completely understand what you mean by "private keys between the parts." But if you mean entirely symmetric encryption, then in many circumstances a cryptographic signature -- and therefore a hash function to produce the signature -- is unnecessary, as an attempt to tamper with a transmission without knowing the secret key would result in meaningless gibberish on the receiving end. The major caveat is t
Re: (Score:2)
Three examples:
First, a stream cipher. A stream cipher is a cipher that generate a pseudorandom stream of data, which you then XOR with your data stream. The most well-known stream ciphers are RC4 (used in e.g. SSL) and the one-time pad (having a bunch of true random data you XOR with your data, where the random stream is the same length as your data). The attack here is that the attacker c
Birthday Attack (Score:5, Insightful)
Re:Birthday Attack (Score:4, Informative)
Re: (Score:2)
Re: (Score:1, Informative)
That's the whole point of the parent, isn't it? Unless I'm missing something, the collision is generated by a birthday attack approach and the parent was exactly right. It's why the attacker needs to write both GOOD.EXE and EVIL.EXE; if he could generate an arbitrary md5 sum, he could replace any GOOD file and this would be much worse.
IMHO it's still pretty annoying (though not ne
use two hash functions (Score:5, Informative)
The case that matters is producing a program with the same checksum as a given program, without the ability to manipulate the correct program beforehand. That's still hard.
Nevertheless, code signing mechanisms in general should probably be prepared for flaws in hash functions. It might be best always to use two hash functions and to have some strategy of migrating. That way, if one hash function gets compromised, there is still another one in place and can be used until the original one has been replaced.
Re:use two hash functions (Score:4, Interesting)
The problem as I see is that the harmless version can be released and gain trust. That version can be tested and inspected, even checking the binary wouldn't reveal malicious code because there wouldn't be any malicious code to find - no dodgy looking system calls, for example. Just a chunk of seemingly random data, which could be disguised as a lookup table, compressed image or whatever. At some later point, after the harmless version has gained trust, its use has become more widespread and the rate of downloads has increased correspondingly, it can be replaced by the malicious version. So while you could initially release a malicious version, being able to first release a harmless version can widen the impact of an attack.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
That version can be tested and inspected, even checking the binary wouldn't reveal malicious code because there wouldn't be any malicious code to find - no dodgy looking system calls, for example. Just a chunk of seemingly random data, which could be disguised as a lookup table, compressed image or whatever.
I'll ask the same thing many others are asking. If you can sneak garbage into a binary, why not sneak the actual payload in? And... there are people who inspect binaries and are thrown off by random garbage code? What exactly are they looking for then? If I throw a fistful of metal widgets under someone's hood, would I expect to fool the driver or his mechanic? The attack most of you are suggesting is so bizarrely over-complicated.
It's like a crazy plan to put a suspicious looking EMPTY package in th
Re: (Score:1)
Isn't using two hash functions the equivalent of using one hash function that is the mathematical equivalent of the two?
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
One hash to rule them all--wot? that's all? (Score:1)
It has occurred to a few people--just a few people--to sign objects with both MD5 and SHA-1.
It seems that it is more difficult to get both MD5 and SHA-1 collisions by quite some orders of magnitude. Someday, perhaps, it can be done but not today. Well, no one has said so, at least.
Anyone for some OR thinking?
Ah yes, this again (Score:5, Interesting)
The relevant links are:
http://www.doxpara.com/research/md5/t1.html [doxpara.com]
http://www.doxpara.com/research/md5/t2.html [doxpara.com]
http://www.doxpara.com/research/md5/md5_someday.pdf [doxpara.com]
I'm pretty sure I talked about third party attestation in that paper.
A more interesting point was made to me just the other day, which is that there's always enough ambient entropy in any real world system to deviate between trusted and untrusted behavior. In other words, for a turing complete app, you *can't* create a meaningful hash, because you aren't capturing all bits that will drive the execution flow. So, getting code signed really doesn't assert anything other than a business relationship. App signatures don't actually work, for any arbitrarily good hash.
Re: (Score:2)
Re: (Score:3, Insightful)
That is
Re: (Score:2)
Re: (Score:2)
The problem is that MD5 can only hash the bag of bits available at compile time. It misses the accumulat
Use GnuPG instead (Score:5, Insightful)
Optional Secutity ? tee hee ROF,LMFAO (Score:1)
FromMSDN Library:
(emphasis added)
If you want security it has to be in effect 100% of the time. Not just here and there WHEN we have time for it and we don't bypass it to improve performance.
the issue here is not whether MD5 is vulnerable but whether it is being used all the
Bost projects I've seen.. (Score:2)
About the only place I see MD5 sums used much is for large iso files, get the md5 sum from the distribution sit
Re: (Score:3, Insightful)
You are quite right, that md5 does not provide and connection to the signer. With a PGP/GPG signature, once I have the correct public key, I can verify all and every signature made with it. And if I do not have the correct key, the first genuine signature will result in an error. Howeber I guess most people do not bother. Even if it is easy. For a kernel download, e
Re: (Score:3, Informative)
As I understand it, the normal way to generate a digital signature is to use a hash algorithm like MD5 or SHA1 and then encrypt the hash with a private key. Then you verify by hashing the file and decrypting the signature with the public key and checking to see if they match. Therefore, distributing signatures instead of hashes is orthogonal to the discussion at hand. If the hash is broken, then the signature is broken, too.
See Wikipedia for more information on digital signatures [wikipedia.org].
Re:Best projects I've seen.. (Score:2)
Door locks are insecure if you can get a key too! (Score:3, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
Not a real life scenario... (Score:3, Insightful)
developper A produce software X(for example openssh), calculate hash of program X and sign the hash with his PGP key.
He then put all these files on mirrors servers on Internet (but not his private PGP key !)
One mirror is hijacked by B.
B wan't to replace X by X' with the same hash than X
This article doesn't provide anything as it says MD5(X+a)=MD5(Y+a), which imply you have to change A in the first place which can't be done easily (and if you can change the original program, then what's the point ?)
what it really means .. (Score:3, Interesting)
Well, what it means is that an evil software megacorporation could publish a digitally signed app that could be replaced with another presumably nefarious prog later on
Re:Not a real life scenario...
Re: (Score:1)
He then put all these files on mirrors servers on Internet (but not his private PGP key !)
/ One mirror is hijacked by B.
B wan't to replace X by X' with the same hash than X
True, but imagine that developer A is the attacker. That developer can create two different versions of his program: One benign version to go trough verification, QA testing, and any customer acceptance testing, a
Re: (Score:1)
The problem is also not the same with a binary only exe and some open source programs where you have access to the source and all the change in the repository are signed with a pgp key.
That way, it's easier to verify that a binary is made from a given source.
It's a lot harder to try hiding something like md5 garbage in source form.
Re: (Score:1)
Yes and no. In some cases, the risk of damage is low enough that it makes sense to trust the developer. In other cases, the risk is great enough that there should be safe-guards in place to ensure that a rogue developer can't insert malicious code. Safe-guards which rely on the MD5 hash should no longer be relied upon to prevent a rogue developer from inserting malicious code.
It's a lot
A workaround? (Score:1)
Take trusted_program and
Post both trusted_program2 and false2 on your web page along with their shared md5sum and invite the user to download them (presumably the user trusts you and your web server or he wouldn't download them in the first place.)
The user is now confident that you cannot replace trusted_program2 with malicious_program without changing the md5sum, because this technique only works with two prefixe
Re: (Score:2)
Nothing new (Score:2)
Re:Nothing new (Score:4, Interesting)
Security hashes (Score:1)
It's not exactly hard to understand that a 128-bit hash is going to be less unique than a multi-kilobyte executable. I believe 3rd grade math has that covered. With processor speed increasing steadily, these things become easier to break with each passing day.
Re: (Score:1)
In theory, it should require 2^64 attempts to find two identical messages with the same MD5 hash. That should be enough to routine prevent brute-force attacks for the foreseeable future, except possibly for attackers with deep enough pockets to build special MD5 cracking machines. This issue isn't simply that the 128-bit hash is less unique than the multi-kilobyte executable being signed
Re: (Score:1)
Md5 as a signature (Score:2)
You have a lengthy verification process for new software - you check it over thoroughly to make sure it can be trusted, and after you certify it as trustworthy you sign it and only need to re-certify if the signature changes next time you download it from me.
I deliver a new version of the software to you (the "good" version), y
Re: (Score:1)
That shouldn't work: MD5 requires the recipient to regenerate the HASH and then check the signature. I have no idea why they think this is a performance improvement as you are going to have to scan the entire content of the messsage ( program ) in order to regenerate the hash
Use multiple hash functions... (Score:1, Redundant)
Did this concept change over the years, or is it just me? heh
MD5+SHA1? (Score:2)
Re: (Score:2)
Depends on how you look at it, and what you are trying to protect against. In most cases combining two hashes in that way will produce something that can be attacked faster than brute force. You'd get better security with a hash function that was designed to be 288 bits in the first place. (Or even 256 bits). In fact even if the two hashes had no weaknesses at all, you could still produce a collision for the concatenation slightly sl
Re: (Score:2)
Re: (Score:1)
Don't expect a better warning just because you are using two different hash functions. With md5 you did get the warning a few years ago. Still most uses of md5 are secure. If you were to be using two, there is no guarantee that you are going to be told about the first break, if somebody want to attack the co
Re: (Score:2)
Okay, stupid example, but you can't assume just piling on additional algorithms is going to make things more secure. While it's probably reasonably safe for MD5+SHA1, you're almost certainly much better off just using, say, SHA-512 instead.
Alternative approach using AES-256 (Score:2)