New Binary Diffing Algorithm Announced By Google

New Binary Diffing Algorithm Announced By Google 192

Posted by timothy on Thursday July 16, 2009 @01:20PM from the smartness-and-light dept.

bheer writes "Today Google's Open-Source Chromium project announced a new compression technique called Courgette geared towards distributing really small updates. Courgette achieves smaller diffs (about 9x in one example) than standard binary-diffing algorithms like bsdiff by disassembling the code and sending the assembler diffs over the wire. This, the Chromium devs say, will allow them to send smaller, more frequent updates, making users more secure. Since this will be released as open source, it should make distributing updates a lot easier for the open-source community."

New Binary Diffing Algorithm Announced By Google

This discussion has been archived. No new comments can be posted.

Search 192 Comments Log In/Create an Account

Comments Filter:

uses a primitive automatic disassembler (Score:4, Insightful)

by flowsnake ( 1051494 ) writes: on Thursday July 16, 2009 @01:25PM (#28719351)

An interesting approach - I wonder if this would also work as well on compiled bytecode like .NET or Java uses?

Also less overhead for Google (Score:5, Insightful)

by Enderandrew ( 866215 ) writes: <enderandrew&gmail,com> on Thursday July 16, 2009 @01:28PM (#28719411) Homepage Journal

Google has to pay the cost for maintaining servers and handling bandwidth for all the OS updates they push out. The more efficient they are in this process, the more money the save.
The good news is that the same benefits could be applied to Red Hat, Ubuntu, openSUSE, etc. Lower costs helps the profitability of companies trying to make a profit on Linux.
The end users also see benefits in that their packages download quicker. I'd be honestly pretty disappointed in any major distro that doesn't start implementing a binary diff solution around this.

wait a minute (Score:3, Insightful)

by six ( 1673 ) writes: on Thursday July 16, 2009 @01:31PM (#28719457) Homepage

announced a new compression technique called Courgette geared towards distributing really small updates
I just RTFA, this has nothing to do with a compression technique.
What they developed is a technique to make small diffs from *executable binary files* and it doesn't look like it's portable to anything other than x86 because the patch engine has to embed an architecture specific assembler + disasembler.

Re:uses a primitive automatic disassembler (Score:3, Insightful)

by hattig ( 47930 ) writes: on Thursday July 16, 2009 @01:31PM (#28719463) Journal

Why not just ship the affected .class files for Java? Disassemble the Jar that is being updated, replace/add/remove the class(es) as per the update instructions and rebuild the Jar.
Chrome's problem is that a massive, bulky, chrome.dll file needs to be sent out with each update, and that it isn't easy to split it up into its constituent parts because it is lower level.
It's not a new idea, but quite impressive that they've actually gone and worked it all out so that it is reliable. And nicer than patching via jumps (noop the old code, add a jump to new code, append new code to end).

Bad explanation (Score:5, Insightful)

by DoofusOfDeath ( 636671 ) writes: on Thursday July 16, 2009 @01:37PM (#28719557)

Courgette achieves smaller diffs (about 9x in one example)
That's potentially very misleading. I can compress any document, down to a single but, if my compression algorithm is sufficiently tailored to that document. For example:
if (compressed_data[0] == 0): return = get_Magna_Carta_text() else: return unzip(compressed_data[1:])
What we need to know is the overall distribution of compression ratios, or at least the average compression ratio, over a large population of documents.

Can a layman get an explanation in English? (Score:3, Insightful)

by AP31R0N ( 723649 ) writes: on Thursday July 16, 2009 @01:46PM (#28719737)

Please?

Re:Also less overhead for Google (Score:2, Insightful)

by maxume ( 22995 ) writes: on Thursday July 16, 2009 @01:48PM (#28719773)

Yeah, but it is probably more like unplugging wall warts than insulating your house.
That is, it will be a hassle, be visible, and show a tiny benefit, all the while distracting from more worthwhile activity.

Re:Solving the wrong problem (Score:4, Insightful)

by Joe Random ( 777564 ) writes: on Thursday July 16, 2009 @01:52PM (#28719815)

If the code is so awful that the bandwidth required for security updates is a problem, the product is defective by design.
No one is saying that the bandwidth is a problem. They're just saying that the bandwidth is unnecessary. FSM-forbid that anyone try to optimize something.
Plus, as the article points out, smaller updates mean more people can receive the update per unit-bandwidth, which means faster distribution of security updates when something critical is fixed.

Re:dictionary (Score:2, Insightful)

by Big_Monkey_Bird ( 620459 ) writes: on Thursday July 16, 2009 @01:57PM (#28719905)

zoo-kee-nee

Re:Like many brilliant ideas... (Score:4, Insightful)

by merreborn ( 853723 ) writes: on Thursday July 16, 2009 @02:06PM (#28720027) Journal

Like many brilliant ideas... it makes you smack yourself on the head and go "why hasn't everybody been doing this for years?".
Probably because the old solution was:
A) Simple
B) Good enough for most purposes.
Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority -- you've likely got much more important things to dedicate engineering resources to.
You know how they say "Necessity is the mother of invention"? Well, when an invention isn't really necessary for most folks, it tends to show up a little later than it might otherwise have.

Re:Solving the wrong problem (Score:1, Insightful)

by Anonymous Coward writes: on Thursday July 16, 2009 @02:21PM (#28720309)

Out of curiosity, could you please point us to some of your code so we can do a comparison?
Thanks.

Re:Like many brilliant ideas... (Score:3, Insightful)

by Just Some Guy ( 3352 ) writes: <kirk+slashdot@strauser.com> on Thursday July 16, 2009 @02:35PM (#28720527) Homepage Journal

Actually, it made me smack my head and say "I assumed we were already doing this".
Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority
So, you've never installed a service pack or another OS update? I'd be more than happy to run "sudo aptitude patch-upgrade" to download the 3KB difference between OpenOffice 3.1.foo and 3.1.foo+1.

Re:uses a primitive automatic disassembler (Score:5, Insightful)

by FunkyELF ( 609131 ) writes: on Thursday July 16, 2009 @02:38PM (#28720561)

It's an interesting approach to a problem that should never have arisen. Why does everything have to be thrown into a 10MB dll? People are browsing on computers, not wristwatches, and there's no reason to abandon modular design to tangle things together for trivial performance gains.
Just because the end result is a 10MB dll doesn't mean that the code and design isn't modular. Thats like saying Linux isn't modular because the live CDs / DVDs come in a single gigantic 4.7Gb .iso file.

Re:Can a layman get an explanation in English? (Score:5, Insightful)

by PCM2 ( 4486 ) writes: on Thursday July 16, 2009 @02:56PM (#28720829) Homepage
But, if the compilers are similar enough to create the same pseudocode/bytecode/ASM, or smart enough to save the source code, and use it for future comparisons, then wouldn't one patch be just as portable as the original source code?
It's a good theory and you're a smart person, but:
1. The compilers probably wouldn't be similar enough. Even developers who use GCC to compile something for Linux usually use Visual Studio to compile the same code for Windows. (The source code for Chrome, for example, shipped as a Visual Studio project.) Mac OS X likes to have everything written in Objective C, so that output would probably be very different.
2. Different operating systems rely on different shared libraries to do the same things. So a function call that opens a file in Linux might not look like a function call to do the same thing on Windows -- it might take a different number of arguments, for example, which means it would look rather different in machine language.
Portability doesn't appear to be Google's primary concern, though. They seem to be keen on the idea of delivering binaries over the wire (real binaries, not bytecode) -- see Google Native Client. [infoworld.com]
Re:Rsync (Score:1, Insightful)

by Anonymous Coward writes: on Thursday July 16, 2009 @03:16PM (#28721109)

Well, not to mention 90%+ of Windows users lack a compiler and linker. And for the other 10% it can be either some version of VC or a windows port of GCC. This is doubly annoying because C++ code has no standardized ABI (in the vein of early 90's UNIX adding extensions that did nothing but make sysadmin work twice as hard) so you have to ensure all your .dlls have been compiled by the same compiler, etc.
Asking a Windows user to compile something is like asking a painter to engineer a building.

Anyone like zucchini? (Score:1, Insightful)

by Anonymous Coward writes: on Thursday July 16, 2009 @03:23PM (#28721211)

In case anyone has missed the reference of the name "Courgette", it's French for summer squash/zucchini type vegetables. So, Courgette as in squash, and squash as in make smaller.

Re:Also less overhead for Google (Score:3, Insightful)

by Enderandrew ( 866215 ) writes: <enderandrew&gmail,com> on Friday July 17, 2009 @01:58AM (#28726493) Homepage Journal

I don't run Ubuntu, but rather openSUSE. When I download updates (including weekly snapshot builds of KDE 4, Firefox and OpenOffice) I end up downloading around 3 gigs each time. That is practically the whole OS. Small binary diffs would make a huge difference here.
I doubt I pull 3 gigs of bandwidth from all Google sites combined in a week.
The other difference is that other Google sites generate revenue. The OS likely will not.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

New Binary Diffing Algorithm Announced By Google 192

New Binary Diffing Algorithm Announced By Google More Login

New Binary Diffing Algorithm Announced By Google

uses a primitive automatic disassembler (Score:4, Insightful)

Also less overhead for Google (Score:5, Insightful)

wait a minute (Score:3, Insightful)

Re:uses a primitive automatic disassembler (Score:3, Insightful)

Bad explanation (Score:5, Insightful)

Can a layman get an explanation in English? (Score:3, Insightful)

Re:Also less overhead for Google (Score:2, Insightful)

Re:Solving the wrong problem (Score:4, Insightful)

Re:dictionary (Score:2, Insightful)

Re:Like many brilliant ideas... (Score:4, Insightful)

Re:Solving the wrong problem (Score:1, Insightful)

Re:Like many brilliant ideas... (Score:3, Insightful)

Re:uses a primitive automatic disassembler (Score:5, Insightful)

Re:Can a layman get an explanation in English? (Score:5, Insightful)

Re:Rsync (Score:1, Insightful)

Anyone like zucchini? (Score:1, Insightful)

Re:Also less overhead for Google (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot