SHA-1 Broken 751
Nanolith writes "From Bruce Schneier's weblog: 'SHA-1 has been broken. Not a reduced-round version. Not a simplified version. The real thing. The research team of Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu (mostly from Shandong University in China) have been quietly circulating a paper announcing their results...'" Note, though, that Schneier also writes "The paper isn't generally available yet. At this point I can't tell if the attack is real, but the paper looks good and this is a reputable research team."
May be a big deal... (Score:3, Interesting)
Now what do we use? (Score:3, Interesting)
what's left (Score:2, Interesting)
maybe more people will use GPG now!
Question (Score:1, Interesting)
Re:Info on what exactly SHA-1 is ... (Score:5, Interesting)
Re:Now what do we use? (Score:1, Interesting)
Bittorrent? (Score:5, Interesting)
How hard is it going to be for people to provide garbage data with correct SHA-1 hashes to screw up downloads?
So what's the big deal for the rest of us? (Score:5, Interesting)
I'm not a cryptographer, just a nerdy engineer, but let me explain my rationale: a hash algorithm takes an arbitrary message and generates a fixed-length signature that has a high probability (10**50 or better for most modern algorithms) of being the original.
Let's assume that your hash algorithm generates a 128-bit hash. Anyone who knows anything about probability can see that is the original message is greater than 128 bits, there MUST be more than one message that will generate the same hash. For long messages, there may be thousands or millions of messages out of a filed of 10**50 (or better) that have the same hash, although many of them will be meaningless garbage.
So SHA-1 has been broken by a group of cryptographers/mathematicians. Does this really mean that they can generate can alter any message in a way that will generate the same hash as the original, thus fooling the math that we use to validate content? No Way! I read Bruce Scheier's Cryptogram every month and he often makes the same argument.
So yes, this means that from a long-term systems security standpoint, we should all move to stronger hashes. Does it mean that SHA-1-based transactions are inherently secure right now?
I think not!
Re:Well (Score:2, Interesting)
I'm particularly worried about BT users, personally. The breaking of SHA-1 will essentially allow the RIAA and others to corrupt many bittorrent downloads.
Impact on Digital Certificates & Issuer Liabil (Score:2, Interesting)
If someone can do this, then what are the liability concerns for certificate issuers (or even their customers) ?
Re:Hmm (Score:1, Interesting)
No. At least, until very recently, SHA-1 was considered a serious cryptographic hash function, well suited for digital signatures and such.
Maybe you're mixing them with CRC-32, which is indeed well suited to combat transmission errors, but not intentional foul-plays.
Unfortunately the SHA series seems to be suspect (Score:5, Interesting)
If this definite break is confirmed, I think we will need to conclude that the entire family is suspect for any genuinely important purpose.
There are a bunch of hashing algorithms on the Hashing Function Lounge that are listed as having no known attacks. At present, the most widespread is Whirlpool. I think it likely that one of these will replace SHA as the hashing function of choice in major cryptographic areas.
Time to dust off your XBox (Score:1, Interesting)
Maybe now we can crack some XBox stuff and run homebrew code on a homeburned dvd (dual-layer, of course.) Please correct me if I don't understand something. I thought M$ used RSA or something to encrypt their games and used SHA-1 signatures as checkpoints?
Not true. (Score:3, Interesting)
Where've you been?
Re:Info on what exactly SHA-1 is ... (Score:5, Interesting)
Re:Xbox-Linux Project B Complete? (Score:2, Interesting)
Have: Public key (it's inside the Xbox kernel), in decimal:
207401193272587237602760235090630171384
Re:Broken, but not for everything... (Score:3, Interesting)
Suppose you're signing a tar.gz file. If the NSA could find a collision, the collision will still need to fit:
- filesize has to stay the same
- you don't want to get errors with gzip
- you don't want to get errors with tar
- the files in the archive needs to make sense
What's the probability of all this happening?
Re:Info on what exactly SHA-1 is ... (Score:5, Interesting)
I don't know about this, but when SHA (the Secure Hash Algorithm) was submitted as an approved algorithm for government use, the NSA reviewed it and suggested a minor change. That modified algorithm is what we now know as SHA-1. It was a few years before public-sector cryptographers caught on to what the significance of the changes was (I wish I could explain it, but it is beyond me).
Encryption vs. fingerprint hashing.... (Score:2, Interesting)
What's the best hash for file fingerprinting, for stuff like version databases, tripwire, etc?
Re:Not a problem (yet) (Score:1, Interesting)
INSIGHTFUL? (Score:0, Interesting)
Re:Is it that bad? (Score:2, Interesting)
1 Billion computers now comes out to 22 days.
But, 1 million computers is still 60 years.
The ten-year mark comes out to 6 million computers.
(BYW, these tests were done on a 256 byte file, bigger files would take longer.)
Still not that bad.
Re:Info on what exactly SHA-1 is ... (Score:3, Interesting)
The original SHA was released in 1993. 2 years later, SHA-1 was released (by the NSA, through NIST) to address unspecified weakness in the original algorithm. In 1998, a weakness in the original SHA was discovered, and in 2004 various attacks on SHA, SHA-1, and other hashes were published. It is unknown which, if any, of the known SHA weaknesses were discovered by the NSA and resulted in the 1995 release of SHA-1, though at least one known attack affects SHA but not SHA-1. It seems likely that if the NSA were capable of releasing SHA-1 as the original SHA in 1993, they would have done so. Thus, it appears that the NSA discovered a weakness in SHA in 2 years that took the academic community 5-10 years to discover. The gap is not as wide as it was when DES was released, but it's still very impressive.
End brief synthesis. Check out the wikipedia article (and links) for more information. That's where I got mine.
Re:Not a problem (yet) (Score:5, Interesting)
Re:Info on what exactly SHA-1 is ... (Score:5, Interesting)
You know, of course, that the NSA did the same thing with SHA right? The original algorithm submitted was SHA-0, then the NSA recommended an unexplained minor change.
Last August SHA-0 was broken, so their tweak appears to have added about 6 months of extra life for SHA-1.
Re:Quick Measures? (Score:3, Interesting)
I have barely any cryptography knowledge, but would the SHA series be any safer if the size of the data was part of the signature? From glancing over Bruce's post (and remembering how MD5 and others were broken), data has to be padded. You can't just change arbitrary bits and produce the same signature. So, couldn't you just add the size of the file to the signature? Does that decrease security, because the size is now known?
It would greatly add to the security in theory except that it wouldn't really work well in practice. I am not an expert on cryto algorithms but the limitations of this approach can be seen even without knowing anything about the actual has algorithm.
The trick with forging messages is that you need to add some amount of garbage to the forged message to make it match the hash of the original (there are other ways in theory that are likely to be less practical). This is why the length inforormation would appear to help. Then you have to disguise the purpose of the garbage so it doesn't draw attention to itself. Perhaps as another fake signature or an uncompressed image. Steganography would give you the requisite number of bits while still letting you include an image (of a hand written signature, a document page, or a company logo)that wasn't obviously junk but the bits would not be consequtive which makes it harder). Instead of hiding them in low order bit steganography, you could replace a number of consequtive pixels; the result would be visable but would be mistaken for a smudge. By including the length in the signature, you make it much harder to make a message that says what you want it to say and still matches the hash.However, the problem with your suggestion is protecting the new size information. If you include the size in the message before hashing, then you simply search for a hash that matches the forged message with modified size which could be about the same amount of work as forging in the first place for a brute force attack though it may well add additional constraints that might prevent taking particular shortcuts vs. a brute force attack in which case it would help. It would really depend on whether the hash crack is bothered by this constraint. If the crack algorithm knows in advance how many extra bits will be generated then running the length through the hash doesn't help at all. If you just add some message size bits to the end of the hash, then the information is unprotected and can be changed to match the message. So, you probably would want to sign the length separately. But if you are going to do that, you could use a hash with twice as many bits in the first place or hash the same message twice with two different initial values for the hash (cracking the first hash may or may not provide a shortcut to cracking the second). Both of these approaches would probably be much more secure than adding the length. Even more secure would be to hash the message with two different hash algorithms.
There are also ways of returning to the original message length even when adding garbage. Suppose I want to change a contract to say you will pay me $100,000 instead of $10,000. To make room for the extra hash fooling bits, I can delete an entire paragraph from the contract that is of no importance to me (or even makes it more favorable to me by its absence). Longer hashes or double hashes would effectively prevent this but length information would not.
Re:Not a problem (yet) (Score:5, Interesting)
not that useful yet... (Score:2, Interesting)
In my opinion, a "real world" attack would be one which given a blob which has already been hashed, would come up with another blob which results in the same hash. To my knowledge, nobody has any useful attacks in that direction yet, although some would argue based upon this research that it may just be a matter of time.
Then we of course need to get into whether that is really useful either. If I find out that and results in the same hash, how helpful is that to me? How is a lawyer is going to prove to a jury that I may have actually signed the garbage instead of the purchase agreement? So, there is even more work to be done to make it a useful real world attack, wherein you might take the original signed text (modified for your evil purposes), append a null character, and then add garbage until the hashes are equal--and hope the UI was poorly written and just displays up to the first null.
How about using MD5 and SHA-1 togeher (Score:2, Interesting)
Re:Sigh (Score:2, Interesting)
missing the overall point (Score:3, Interesting)
figure that if you could represent every possible combination in 128 bits, you would never need to have 129 bits of data.
Because this is not true all hashes will have collisions. However the chances of multiple hashes all having collisions with altered data is 'pretty damn slim'. So therefore the best solution, most likely in the future, and presently is to authenticate messages, identification (ala ssl certificates**) and binaries with multiple hashs known to be reasonably strong. One doesnt need to be a cryptologist to realize that using something like md5, sha256 and like ripemd160, the chances of collision in all 3 hashes are quite slim, and within the range of acceptable risk.
Can someone (Score:3, Interesting)
I know it's next to impossible to create the data from the hash but shouldn't it be theoretically possible?
If the hash reduces the possible files which match it by 99.999% then shouldn't it be possible to send that much less data?
What to use in new apps? (Score:3, Interesting)
Re:Info on what exactly SHA-1 is ... (Score:5, Interesting)
For quite a few applications the hash is broken even if I cannot easily find a second string with the same hash as one given. Even if I can "only" at will find two strings with the same hash, that is a pretty serious weakness.
I could, for example, create two documents with the same hash, have you sign one, and then claim you signed the other one. Since the hashes are the same your digital signature will be valid for both.
For other applications, like replacing a signed document with another without being detected you're rigth -- that would only work if one could easily find a document with a given hash.
Re:Can someone (Score:2, Interesting)
Yes, but that's not what the hash does. It only makes it much harder to find files that hash to the same value because the hash values are randomly distributed. If you have a 160-bit hash then (ideally) every 2^160th file will have the same hash.
This means basically that there are an infinite number of files that hash to the same value. For this to be used for compression there must be a 1:1 relation between inputs files and outputs hashes. In other words, the algorithm must be reversible, something which cryptographic hashes are not. When a file is hashed, data is thrown away, data which can never be recovered.
Important info on crypto hashes (Score:4, Interesting)
First: MD* SHA-* etc - they are all basically the SAME algorithm! The are just minor modifications of the same exact thing, so a break in one is a break in all.
Second: Tons and tons of people ask: can't we merge two hashes together and get a stronger one? Yes you can that's EXACTLY what MD* and HA-* DO! They are a combination of different hashes! That's how they work.
So if you really did have a good combo of hashes then just give them a name and use them as a hash - don't bother just plain merging existing ones.
Also, merging say MD5 and SHA-1 is pointless - they are both based on the same hashing code! You are gaining nothing by merging them.
Now I get the "Use 2 hash algorithms" comments (Score:3, Interesting)
770b95bb61d5b0406c135b6e42260580 for MD5
b924c2f360b572e17c971f1b1b667e0732944df7 for SHA-1
Trying to tinker around with the file and make both hashes come out the same as above would presumably be much more difficult than for any single hashing algorithm, and it might very well be nigh impossible. The little light bulb has finally come on. Now I get it. Yeah using two hash algorithms together would probably work nicely. Don't combine the results mathematically, just append the keys together into a big long string. The final MD5+SHA1 hash key for my file would be:
770b95bb61d5b0406c135b6e42260580b924c2f360b572e
I don't know whether this would be stronger than a SHA-2 of equivalent bit length or not, but now I get what some of you have been saying. From a common sense view, it would seem that something like this would be pretty darn tough to crack, because you would have to make two different algorithms compute matching keys for a given dataset.
Metadata (Score:3, Interesting)
For instance, suppose I sign the message "Hi, I'm Victor", along with the hash it would contain the size (14 bytes), type (English text), encoding (7bits ASCII) and how about the range of codes used in the messages (from U+0027 - U+0074).
A good hash would give a uniformly distributed random hash for the message, so it is safe to assume that even if we could find a collision, it would be highly unprovable that it would satisfy all the meta-data. In some cases it could be provable that this kind of hash is unbreakable, since there is a finite number of messages that satisfy the meta-data (if you could hash all possibilities and verify that there were no collisions you're 100% safe).
Re:Pigeon-hole principle (Score:2, Interesting)
To nit-pick further, the pigeon-hole principle says nothing about not reusing any slot. It states that if you place n items in to n slots, either every slot is filled, or (at least) one slot has more than one item.
Equivalently, if there are n+1 items, there must be a slot with more than one item. Your statement is a special case of the principle, but not as general.
It is possible to prove (by induction) that there are an infinite number of collisions for some hash value using this. However, proving that collisions exist for every hash value requires detailed knowledge of the algorithm, and doesn't follow directly from the pigeon-hole principle.
Doubt it (Score:3, Interesting)
If we were talking about an encryption scheme, the temptation would be there. If it were and encryption scheme adopted by The Bad Guys (tm) then NSA would of course be able to read their secret communications.
But that's not what SHA is for. It's to allow a piece of data to be authenticated. You can satisfy that that this file is indeed from me based on a simple number computed from it and a secret we both share. When thegovernment procures a piece of software that is going to do something like launch nuclear missiles, it would be nice if that software could figure out whether the order really came from PotUS. On the other hand, when the order comes from Osama to fly the plane into the WTC, authentication of that order, while useful, is not as critical.
So, the national security interests here are clearly in favor of their being a publicly available, secure hash function.
Re:Not a problem (yet) (Score:2, Interesting)
So for legit use you have the following (assuming use of MD5 and SHA1): Compared to what a brute force would have to do: ... or (depending on which is less expensive)
Not quite the end of the world (Score:3, Interesting)
As others have pointed out, I can create 2 documents, X and Y, have a target sign one, then substitute the other. His digital signature will be valid for both. Great - it takes only 2^69 attempts to get a collision - I'm sure the chances that the X and Y found will both be valid English documents, one of which I could convince a target to sign, the other allowing me to scam him out of enough money to make the whole ordeal worthwhile.
However, people keep copies of what they sign. Even if I did find a collision, and even if both documents were valid English text, the guy could say "I didn't sign Y - look, my signature is valid for X - he scammed me". Great.
The more likely scenario is someone signing their own document, then claiming it was fraudulent. They could create their own X and Y, sign X that somehow involves another party, then claim they actually signed Y and this other party was the scammer. But they still have to find X and Y in 2^69 steps such that both make logical sense in the English language - no simple task.
This is cool in a theoretical sense, but in a practical sense, its like saying you don't need a million monkeys on a million typewriters typing for a million years to generate Shakespeare; it'll only take 999,999 monkeys on 999,999 typewriters...
Or, to go back to the theoretical world: with processor speeds doubling every 1.5 years, and this team shaving 11 factors of 2 off of the break time, the lifetime of SHA-1 just shortened by about 16.5 years. Not quite the end of the world as we know it.
Step 1: Break SHA-1
Step 2: ?
Step 3: Profit!