Apache Subversion Fails SHA-1 Collision Test, Exploit Moves Into The Wild (arstechnica.com) 167
WebKit's bug-tracker now includes a comment from Friday noting "the bots all are red" on their git-svn mirror site, reporting an error message about a checksum mismatch for shattered-2.pdf. "In some cases, due to the corruption, further commits are blocked," reports the official "Shattered" web site. Slashdot reader Artem Tashkinov explains its significance:
A WebKit developer who tried to upload "bad" PDF files generated from the first successful SHA-1 attack broke WebKit's SVN repository because Subversion uses SHA-1 hash to differentiate commits. The reason to upload the files was to create a test for checking cache poisoning in WebKit.
Another news story is that based on the theoretical incomplete description of the SHA-1 collision attack published by Google just two days ago, people have managed to recreate the attack in practice and now you can download a Python script which can create a new PDF file with the same SHA-1 hashsum using your input PDF. The attack is also implemented as a website which can prepare two PDF files with different JPEG images which will result in the same hash sum.
Another news story is that based on the theoretical incomplete description of the SHA-1 collision attack published by Google just two days ago, people have managed to recreate the attack in practice and now you can download a Python script which can create a new PDF file with the same SHA-1 hashsum using your input PDF. The attack is also implemented as a website which can prepare two PDF files with different JPEG images which will result in the same hash sum.
Re: (Score:1)
Re: (Score:2)
Not really: http://marc.info/?l=git&m=1156... [marc.info] .
Git hashes objects (commit, trees, blobs, tags) instead of individual tags. If you managed to somehow create, say, a commit with the same SHA1 as another existing in a repository pushes to it would be simply ignored.
Re: (Score:2)
...instead of individual files...
GIT (Score:4, Interesting)
Does the Git usage of SHA-1 *really* cause silent problems? I'm not sure how Git works internally but I was under the impression that it hashes whole objects, like individual source files at least.
The individual objects inside git aren't file.
The individual objects are commits (i.e..: the content of a patchfile, and a few information like pointer to other past commits to which this patch applies).
To make things easier, a handy number designates this commit - this is currently generated by SHA-1.
(Git is a content-addressable platform. You don't access object by name, you access them depending on their content. But instead of using the whole content to access them, you use addresses generated by SHA-1 to access the various blocks.
So to say which are the parent commits to which the patch in a commit applies, you just mention them by using the SHA-1 sum of the content of these commits).
A theoretical attack would be:
- try to generate 2 commits.
one adds a clean piece of code. the other adds a backdoored piece of code.
but both commits hash to the same SHA-1 so they would be considered as "the same content" by git.
Then try to force your target to re-download the whole repo from scratch from your backdoored history (otherwise git will simply ignore the commits with sha-1 sum that it already has - it thinks that it has the same content already).
In practice it's currently not doable.
The only thing that google managed to generate is a pair of block series. Each series contain completely random junk. Both series end-up generating the exact same shasum even if the random junk is different.
- That is exploitable in a PDF (or any other binary format that supports scripting. You could even do it in an EXE) : using the embed scripting present 2 different contents depending on which random junk is present.
- That is not exploitable in a sourcecode commit : you would need a believable explanation for why the random junk is present in the patched source code.
AND you would need a piece of code which reacts differently (normal vs. backdoor) depending on which random junk is present - to be able to pull that unnoticed would require "Underhanded C Contest"-level of ingenuity.
That's it, you only have blocks of random garbage.
Google currently can't produce hashes colliding from arbitrary pieces of data ("Hey google: here's is legit script A, and that's malicious script B. Add a small nonce at the end so they both end-up having the same sha-1sum") ("Actually don't add a nonce, that would be too conspicuous, try to tweak the punctuation in the comments instead")
Also as you mention, further edits will be problematic :
if I edit script A and submit a patch, this patch will be valid, but will completely fail on top of script B.
Re: (Score:1)
Git and SVN work very, very differently under the hood. The fact that they rely on the same hash algorithm is irrelevant as they use it in very different ways.
Re: (Score:2)
Because it was good once. Better than MD5. Changing it can break a lot of compatibility. So they don't change it.
If you are keeping software running over a long time. You need to balance compatibility, Security and maintainable design. Otherwise such projects will take decades to develop and be out of date on release.
In other news (Score:1)
Webkit is apparently on SVN repository.
Re: (Score:2)
And that's the problem, that SVN has crap handling of colliding values. They use their own homebrew NoSQL store, FSFS, which doesn't handle things like duplicates in any way because it's NoSQL and web scale and stuff, like MongoDB [youtube.com]. So the message here is "don't build your app around a crap NoSQL database store", not "SHA-1 will kill you".
Anyway, I've gotta get back to my job shovelling pig shit, and administering anal suppositories to sick horses.
Re: In other news (Score:4, Insightful)
Actually, svn is just about the perfect source control system if you want something quick and dirty that you can understand. git (I presume that is what you would propose as an alternative) adds no features to many small software development teams. Fortunately the svn->git migration path is well trodden.
If you had mentioned cvs, or rcs, then i'd agree ;-)
Re: In other news (Score:5, Informative)
I love arguing about git.
SVN has several huge advantages over git. It's far simpler. It doesn't have a thing called 'rebase', which rewrites your commits and occasionally messes them up. Its revision numbers are actually in order, which means you always know which revision came first, given two of them, something that's impossible with git's hashes (YES - I know why the hashes are used... but the reality of 99% of software development is that the repository is centralised, so git's solving an almost non-existent problem here). SVN supports real cherry-picking, and actually records in the repo that you took code from somewhere, as opposed to git's cut-n-paste approach.
SVN has branches, git has pointers into a tree. Thus in git, it is impossible after the fact to determine to which branch a change was committed, just in which branches it now currently resides. Branches don't really exist in git at all, they aren't versioned (who created a branch, and when?), and if you accidentally delete them you tend to lose the commits against them. Tags in git are even worse. Added to which is the fact that both are implemented in the filesystem as regular files, which means you're at the mercy of your filesystem's ideas regarding case and permitted characters, and good luck if someone tries to check it out into a filesystem with different ideas. Nice design decision there Linus, guess you were having an off day? And the line-endings stuff?... Oh. My. God.
Re: (Score:1)
Tags in git are even worse. Added to which is the fact that both are implemented in the filesystem as regular files, which means you're at the mercy of your filesystem's ideas regarding case and permitted characters, and good luck if someone tries to check it out into a filesystem with different ideas. Nice design decision there Linus, guess you were having an off day?
You don't get that complaint.
It is well known that Linus created git to manage the linux kernel - a work which is done entirely on linux filesystems where the rules always are the same: case sensitive, and any character except '/' and '\0' works in file names.
If you use git for something else - well the fact that you are even able to is just a side effect. It is a unix scm; if you use git in another environment then surely everybody uses names that fit the filesystem used there? If you use it in a mixed env
Re: (Score:3)
SVN has several huge advantages over git. It's far simpler.
Explain in one or two sentences what a "tree conflict" is and how to resolve it.
Re: (Score:2)
SVN has several huge advantages over git. It's far simpler.
Explain in one or two sentences what a "tree conflict" is and how to resolve it.
Create new svn checkout and copy/paste over your changes.
:)
That was "easy", hehe, just kidding I would use git over svn any day
Re: (Score:2)
As I'm sure you must know perfectly well, it's a conflict caused by a change to a file conflicting at the tree level, so that one person modified the file, and another either deleted or moved it. Git gives a conflict in the first case - naturally - and may give a conflict in the second if its magic content-tracking algorithm fails (which it does, especially in non-trivial cases). You resolve it in the usual way, you inspect both sides, and figure out what to do.
I know I'm on a losing wicket hating git like
Re: (Score:2, Informative)
> I'd give alot to go back to revision numbers...
Why? All that matters is the logical ordering of commits, not the chronological ordering. I DGAF about when someone wrote a bit of code. All that I want is for each commit to more or less work, and for the history to be easily bisectable to aid in bug hunting.
> Git gives a conflict in the first case - naturally - and may give a conflict in the second if its magic content-tracking algorithm fails...
As an SVN -> Git convert, I've had git just Do The R
Re: (Score:2)
Re: (Score:2)
Mercurial seems so much nicer than what I read about git. I'm glad my company decided to go that way.
Re: (Score:1)
SVN has several huge advantages over git.
Ok, let's see these advantages.
It's far simpler.
I get it, you don't like all this changing history, rebasing, amending, reflog stuff. But for most basic git operation you need to know as many commands as for SVN. And with git you get the benefit of all the magic you can do on top of that. It's almost like saying that you prefer DOS to anything else because it's simpler - less commands, no pesky multitasking and you can do with it everything YOU need to.
It doesn't have a thing called 'rebase', which rewrites your commits and occasionally messes them up.
You conveniently ignore the fact that rebase is used almost exclusively
Re: (Score:2)
See? I told you arguing about git was fun.
Fact: SVN stores more information about what's going on with your source code than git, and it never loses anything, even if you ask it really nicely. And it never magically changes files just because it decided that your line endings need to be just-so.
You silently ignore the fact that with SVN you have to do this whole tree copy on the svn server to create a branch,
I silently ignore that, because it's not true. SVN marks the point at which the copy was made, who made it, and when. It doesn't actually copy anything, because that would be silly. You can do the copy on the server,
Re: (Score:2)
when git "messes them up" it's actually you messing them up because you made a mistake during rebase
No it's not. It' git crashing, and borking my local copy, thanks very much. And yes I have the latest version.
Re: In other news (Score:1)
I've been using Git for over 5 years on a dozen different platforms and never once had Git crash. Buy a new computer.
Re: (Score:2)
In SVN a commit is final. This encourages developers to leave unfinished work in their work folder without creating a commit until they are "done". So you need a separate backup process for your work folder for any changes that take time to complete. Plus you often end up with a monolithic commit with a bunch of changes. Then how do you review those changes before pushing upstream?
git rebase gives you a solution to this problem. Whenever I think I've made progress towards solving a problem I can create a c
Re: (Score:2)
If you need rebase in Subversion, there is a Python script that can do it https://bitbucket.org/x31eq/om... [bitbucket.org]
Re: (Score:2)
And this would differ from any other open source/proprietary/free/closed source software in what way?
Re: (Score:2)
I've worked professionally with most source control systems for decades. I'm afraid to say that the only remaining features which Subversion does better than git are the ability to check out only one directory of an upstream repository, rather than needing to check out the entire repository, and the inability to delete content from the upstream repository.
The ability to delete content is from experience a vital component, because developers can and will accidentally pollute the central repository with undes
Re: In other news (Score:5, Informative)
Re: (Score:1)
Completely agree - if you are just starting to use git, and you don't have a git rock star handy (guy next cubicle over that noisily snorts Cheetos all morning), and you don't have access to Stack Overflow, just quit now.
There's a reason this SA answer [stackoverflow.com] has been upvoted 13,182 times.
Re: (Score:2)
Seriously? How hard it is to type git revert?
Re: (Score:2, Interesting)
I take it you've never seen a team that for some reason merged current files on a pull to effectively revert commits pushed to the remote with their few intended changes, over and over and over, across several people, for days, before noticing that there was a problem. You will bang your head on a desk shouting "why didn't they notice there was a problem..."
I can't believe though that question was upvoted 13,182 times. According to this SA "most upvoted" query [stackexchange.com], it's the currently the second most upvoted que
Re: (Score:2)
git revert is the wrong answer. git revert makes ANOTHER commit to undo the changes of the first commit. Why the hell do you want to preserve mistakes in the history? The accepted answer is more correct than git revert. This is git.
The reversal SHOULD be a new commit. Undoing commits on a distributed SCMs means you're effectively deleting history.
Re: (Score:2)
I've been using git on and off for a while and, honestly, it is the most developer-friendly SCM out there. Which kind of problems do you refer to?
My main pet peeve with git is that it really doesn't work well with big repositories, large number of users, or binary files. Other than that it is a joy to work with.
Re: (Score:2)
A quick google of "problems with Git" will quickly reveal the various challenges that git brings to the table, for example git push --force. More generally, any team using git needs to decide on a workflow and carefully adhere to it. How do we manage merge workflows? To rebase or not to rebase? etc. With traditional source control, this is significantly easier.
I'm not anti-git, far from it. I introduced Git into the company I work for and love it; and it is absolutely the best source control system for dist
Re: (Score:2)
But i don't quite get it yet. git push --force is not supposed to be a straightforward, or even common operation, as it can destroy history. And selecting/enforcing a ranching schemes is a problem you'll run into with every other SCM in existence.
I'll admit that git gives you enough tools to shoot yourself in both foots if you're willing to, but it also provides very straightforward, easy to use commands for everyday operations. Anyone proficient in SVN can pick up git in 20'.
Re: (Score:2)
as it can destroy history
Which should, really, be impossible for a client to a source control server.
Re: (Score:2)
It doesn't - unless you allow it, of course. You can destroy history on you local repository all you want but the upstream one will reject it unless specifically permitted.
Re: (Score:2)
Re: (Score:1)
Seriously... why couldn't we have just had a revision number? We're all committing to the same server.
No, we're not. That is kind of the point of git.
Re: In other news (Score:4, Funny)
You want a revision number? Simple;
$ git rev-list HEAD | wc -l
Assuming everyone is on the same branch of course....
(Obligatory XKCD) [xkcd.com]
Re: (Score:2)
81747
Thanks complete_loony, I'll get that integrated into our workflow.
Re: (Score:2)
Apprently (I haven't read the source code), the authors agree with you:
The name "git" was given by Linus Torvalds when he wrote the very
first version. He described the tool as "the stupid content tracker"
and the name as (depending on your way):
- random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
- "global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
- "g*dd*mn idiotic truckload of sh*t": when it breaks
Re: (Score:1)
Clone a repo, change a file, stage it for commit, and then commit it. How fucking stupid are these morons that they can't do that?
If that is all you are doing with git, then you might as well stick with svn.
Re: (Score:1)
git init
git add
git commit
git push
More like:
$ git init ...
$ git add
$ git commit
$ git push
error: no remote configured
$ git remote add origin
$ git push
error: failed to push some refs to...
To prevent you from losing history, non-fast-forward updates were rejected
$ git pull
You asked me to pull without telling me which branch you
want to merge with, and 'branch.master.merge' in
your configuration file does not tell me, either.
$ git pull master
already up to date
$ git push master
error: failed to push some refs to...
To prevent you from losing history
Re: (Score:2)
I like Git, and prefer its pleasant UI, but I can see there are definitely reasons people would use SVN. (I can even think of reasons a team would use Visual Source Safe, although that's more of a stretch).
FINALLY! (Score:1)
It's now time to retire SVN... everywhere... permanently.
Re: (Score:1)
If you don't like it, don't use it. Personally, I love it.
Re: (Score:3, Funny)
Re: (Score:3)
Pretty sure they do.
Re: (Score:1)
Re: (Score:2)
I wonder if carpenters feel the same way about hammers
Hahahah tip of the iceberg. I saw two carpenters on my house arguing about who had better screwdrivers. Yes people most definitely do.
Re: (Score:2)
I wonder if carpenters feel the same way about hammers or if developers are just way to opinionated...
Yeah, typical carpenter hammer arguments:
*) Hammer weight (usually 16-24oz for house framing)
*) Handle type (wood? Fiberglass? (fiberglass hammers suck tbh))
*) Is the face of the hammer smooth or textured?
Re: (Score:1)
"In the wild" - slight exaggeration (Score:3)
Someone checked in PDFs that demonstrate the first engineered SHA-1 collision and this broke SVN. PDFs in question took 6500+ cpu years + 110 GPU years to generate. "In the wild" is a bit panicky & excessive.
What does this actually means in terms of integrity of repos and other things that rely on SHA-1? Does it merely break repos or does it facilitate injection attack vectors - how important is secure hashing in the guts of repos? What precisely is being secured? SHA-1 has been deprecated for SSL certs already so you shouldn't be using certs with SHA1 sigs anymore. Myself, keep an eye on how this develops and start thinking about using SHA-2 but won't be replaing git or existing usage of SHA1 for password hashing anytime soon.
Re: (Score:2)
Re: (Score:3)
"In the wild" is a bit panicky & excessive.
No, it's really not. This demonstrates that SHA-1 is not only weak, but broken. One golden rule about security is that it never improves over time. It means that collisions are now possible, and are within reach of moderate sized organisations. Google can clearly manage, governments certainly can and any criminal organisation with a large enough botnet can manage too. This isn't just finding random data either: it's a practical attack whereby two valid PDFs bot
Re: (Score:2)
This. It is safe to say SHA-1 is effectively broken at this point and existing users should start migrating to better alternatives.
But let's not panic either. The world is not crumbling down to pieces anytime soon.
Re: (Score:2)
"not only weak, but broken" seems premature. The attack here involves manipulating two obtuse file formats to yield altered files with a shared hash, different to original unaltered hashes. Definitely weakened and yeah you are probably right this is the final toll for SHA-1 and from here things are likely to get worse quickly. I'll be mindful of this when I think about the various places where I use SHA-1 and start thinking about switching in other things. But I am failing to see how this right now transl
Re: (Score:2)
"not only weak, but broken" seems premature. The attack here involves manipulating two obtuse file formats to yield altered files with a shared hash, different to original unaltered hashes.
It took less than 3 years for MD5 to go from "first collisison" to "can fake certificate trust chains".
. But I am failing to see how this right now translates into a practical vector for the various places where I encounter SHA-1.
But don't forget that the open literature discovered an as-yet-unknown attack against MD5 in
Re: (Score:3)
Say it with me: Hashing is not Encryption. Hashing is not Encryption. Hashing is not Encryption.
Very high level:
Hashing is the irreversible mapping of a set of bits onto a (usually smaller) set of bits in order to obfuscate the original set of bits (one-way)
Encryption is the mapping of a set of bits onto another equally sized set of bits where the mapping is reversible through some process (two-way)
Hashing can be done with salts so that using rainbow tables is harder or impossible, but there will always be
Re: (Score:2)
Was somebody confusing hashing with encryption?
Re: (Score:2)
It means that collisions are now possible, and are within reach of moderate sized organizations.
This is the key. 6500 CPU years or 110 GPU years of computational power is not that difficult to achieve. When this news broke last week a few of us at my work had a discussion about it and while those number sound impressive we then realized that at work we have access to probably 2x that processing power in our building.
Re: (Score:2)
Right, it is still just like Linus said about the git sha-1, not really a big deal because it isn't even the security layer.
If developers with write access to your repo are malicious, you have much worse problems. This is not a serious threat, it is just an edge case that the future will prevent.
The real lesson IMO is, if you do roll your own security, use a library for the password hashing. And if the algorithm ends up having been the wrong one, you'll just update the library. If it is on the network, use
Re: (Score:2)
What if they aren't malicious? I mean, WebKit SVN is down not because a developer wanted to try it, but because they were submitting a test case. A test case meant to verify that WebKit's caching algorithms aren't vulnerable to a SHA-1 collision.
And in checking in this test case, he inadvertently broke the entire repository. It's complet
Re: (Score:2)
None of that has meaning or value.
This doesn't crash anything, and a test case meant to do some shit that it doesn't do well doesn't cause a problem other than for that test case. There is no bad thing happening in your story, just somebody has some shitty code.
Then you wave your hands and say, "he inadvertently broke the entire repository."
There is no worry that repositories would, or even might, or even could, because irreparable. That's just making shit up wildly. The speculation in the stories were goin
Re: (Score:3)
If someone checked in, that means they have permissions to do so. It's not like Git just blindly accepts commits with the same hash but different contents. We know it's possible, it's even possible with SHA256 to create a collision, as long as you're making a hash, you can create a collision as you're mapping an infinite set of bits onto a finite set of bits, there will always be a second set of bits that creates a collision as the number of sets approaches infinity regardless of the hash function you use.
T
Re: (Score:2)
Re: (Score:3)
Umm, that is an uncited claim in the summary. Nothing of the sort is stated in any of the links. The summary links to a paper that provides more details of the attack. Very heavy and technical though a few inital takeaways from it is that implementations only take a few days to run on gear they have so does seem safe to assume that SHA-1 collisions are pretty much pwned.
The Python script in question doesn't find new SHA-1 collisions. It takes two input PDFs and produces two output PDFs that hash to the same value. It uses some quirks of how PDFs work, plus that original SHAttered collision generated by the Google researchers. Finding another collision is a lot of work. Using a known collision to generate PDFs with the same hash value is not.
https://github.com/nneonneo/sha1collider
Re: (Score:2)
Re: (Score:3)
To be fair, any pair of distinct inputs to SHA1 that hash to the same value are a new collision. In general, being given one collision for a hash function doesnt make it automatically easy to find another. Its only because SHA1 is an iterated hash function (merkle-damgard) that this becomes true. (admittedly, almost all practical cryptographic hash functions are iterated constructions.)
If SHA1(x0) = SHA1(x1) then for any z SHA1(x0¦¦z) = SHA1(x1¦¦z). I'm guessing t
Reading the paper. What is in an exponent?? (Score:2)
I am trying to read their paper on the sha1 collisions over here: https://shattered.io/static/sh... [shattered.io] and there's some unusual equation stuff.
mi = (mi3 mi8 mi14 mi16)1
Can anyone explain that to me in english?
Re: (Score:2)
Ah dam. My unicode got munged by the slashdot anti garbage filter. Should have hit preview first!
Anyway the symbol I was referencing is a circular arrow pointing in a clockwise direction that looks like the images on this page: https://en.wikipedia.org/wiki/... [wikipedia.org] . I've never seen that in a paper. What does it mean when it's in an exponent?
Re:Reading the paper. What is in an exponent?? (Score:4, Informative)
Hash Functions 101 (Score:5, Informative)
A CRC is an example of a hash function and a long CRC would probably be good enough for GIT or most repositories.
First Pre-image resistance - this is a test of the one wayness of the function. Given a hash value it is difficult to find a pre-image that hashes to that value. Given y a string of bits of length hash output length finding X such that h(X) = y is hard.MD-5 and SHA-1 are still resilient against first pre-image attacks
Second Pre-image resistance - given a message X finding a Y such that h(X)=h(Y) is difficult. MD-5 and SHA-1 are still resilient against second pre-image attacks
Collision resistant - It is hard to find two messages X and Y such that h(X) = h(Y). Note the attacker here is free to choose both X and Y. Both MD-5 and SHA-1 are no-longer collision resistant.
So far however the two messages X and Y have to be nearly identical. They have to start and end the same way and the blocks that are changed actually have to be changed and tested together to make sure the hash function internal state changes only in a specific way. I can't create a document that says the rent will be $3000 per month and another that says it will be $30000. (I might create one that says it is $3149.21 and the other $53210.63 per month, like in the PDF example they played with a colour field). Also because of the way the internal state of the hash function changes we now have a way of detecting if someone is feeding a "funny" stream of bits into our hash function and detect this attack with a very low probability of a false positive.
Re: (Score:3)
Here's what it means (Score:5, Informative)
Here's what it means: One major aspect of modern cryptography are "hash functions"- a hash function is a function which essentially has the property that in general two inputs with very small differences will give radically different outputs. Also, ideally a hash function will also make it hard to detect "collisions" which are two inputs which have the same output. In general, hash schemes are used for a variety of different purposes, including determining if a file is what it claims to be (by checking that the file has the correct hash value).
Every few years, an existing hash system gets broken and needs to be replaced. MD5 is an example of this; it was very popular and then got replaced.
One of the major currently used hash schemes is SHA-1. However, a few days ago, a group from Google described an attack that allowed them easily find collisions in SHA-1 (easy here is comparative- the amount of computational resources needed was still pretty high). The group released evidence that they could do so but didn't describe how they did so in detail. They gave an example of two files with a SHA-1 collisions and they also described some of the theory behind their attack. What TFS is talking about is how based on this, others have since managed to duplicate the attack and some make some even more efficient variants of it; so effectively this attack is now in the wild.
Re:Here's what it means (Score:5, Informative)
FWIW, you're correct, but "hash function" englobes much more than that. Technically, a CRC is, by definition, a hash function. So is bit parity.
A cryptographic hash function has the properties you mention, plus the fact that it must not be easily reversible and uniformly distribute results over its entire output space.
Re: (Score:1)
This would appear to be the issue at stake (or maybe steak YMMV).
SHA-1 in git and co (Score:2)
A cryptographic hash function has the properties you mention, plus the fact that it must not be easily reversible and uniformly distribute results over its entire output space.
The later is a property which is not guaranteed by most common checksums.
Thus, when you need a hash function to give a number to use as a handy "nickname" for a collection of data (e.g.: for a hash look-up table. Or for a content-addressable like git to create said addresses for a given content - and thus to give a serial number to a commit. Or apparently also used in SVN to give a simple number to designate commits), it might be a good choice to pick-up a cryptographic hash like SHA-1 because it guarantees
Re: (Score:2)
Re: (Score:2)
Np. Sorry if i came across as pedantic - i though the distinction was important because from reading other threads people don't really seem to understand what SHA1 is supposed to and not to do.
Re: (Score:2)
It seems obvious to me that a small string sequence could be identical from two differents long original texts. Even it happend, the hash function is NOT the original message, and a collision could happen. It does'nt mean that the two original texts are the same.
Am i right ?
Yes. A hash is nothing more than a function mapping data of arbitrary size to an output of fixed, smaller size so by definition you can always construct two inputs which yield the same hash. What makes crypto hashes secure is that this is normally very, very hard to do - that is, given a hash generate an input from it.
Re: (Score:2)
Or, it means more generally that updates are bad, and true security will only come from removal of code thrash. We have to figure out what features we actually want, and implement them, and then stop changing those features.
As long as everything is thrashing, everything is vulnerable. Protections will be temporary and new bugs will be introduced even into the protections because those too are always experiencing code thrash.
Re: (Score:1)
It means the exact opposite of this.
An old(er) cryptographic hash is now unsafe to use as assumptions that the developers of subversion (and other software) made about it are now invalid, so you MUST update to newer version to avoid unforseen issues due to the ability to generate multiple inputs that hash to the same value.
Re:Here's what it means (Score:5, Informative)
Google produced two pdf's that differ in some binary data near the beginning of the file. The SHA-1 hash routine processes data one block at a time, updating its internal state. There are two consecutive blocks that differ between the pdf's. The first pair of blocks produce an internal state where half of the bytes are the same. The second pair of blocks then produce an identical state. The remainder of the pdf files is the same.
So you can use these two pdf prefixes and append whatever data you want to them to produce your own pair of files. Pdf includes a programming language for rendering content. Within this language you can inspect the earlier bytes of the file to detect which version of the file you are rendering, and make some visual changes. So while there are only a few bytes that are different, you can make two pdfs that display different content.
Nobody has invested the time to produce a new hash collision, but someone has already automated the production of duplicate pdf's based on this work.
Re: (Score:2)
Re:Here's what it means (Score:5, Interesting)
This is why git is not vulnerable in this specific instance. In git all objects are prepended with their type, in this case "blob". Of course if you had $100k (-ish) to burn, you could repeat this attack on a file that does start with "blob" to break git.
However you don't need to do this. This attack depends on reaching an intermediate state with specific properties in order to massively reduce the search space. Any attempt to hash a file that reaches one of these states can be detected and rejected. If you swap to using https://github.com/cr-marcstevens/sha1collisiondetection [github.com] for all SHA-1 calculations, every instance of this attack can be detected and rejected.
Also I mis-spoke slightly and spotted my error after checking the paper [shattered.io] again. The first pair of blocks have half of the same bytes, but produce an internal state with only 6 bytes of differences. The second pair of blocks, again only differ in half of their bytes, and exactly cancel out those 6 bytes of differences. See Table One on page 3 for the actual byte values.
Re: (Score:2)
Wouldn't it be better to switch to SHA-512 or something?
Re: (Score:2)
Re: (Score:2)