Slashdot is powered by your submissions, so send in your scoop

Exhaustive Data Compressor Comparison 305

Posted by kdawson on Sunday April 22, 2007 @10:10PM from the pick-one-smaller-or-faster dept.

crazyeyes writes "This is easily the best article I've seen comparing data compression software. The author tests 11 compressors: 7-zip, ARJ32, bzip2, gzip, SBC Archiver, Squeez, StuffIt, WinAce, WinRAR, WinRK, and WinZip. All are tested using 8 filesets: audio (WAV and MP3), documents, e-books, movies (DivX and MPEG), and pictures (PSD and JPEG). He tests them at different settings and includes the aggregated results. Spoilers: WinRK gives the best compression but operates slowest; AJR32 is fastest but compresses least."

This discussion has been archived. No new comments can be posted.

Exhaustive Data Compressor Comparison

Load All Comments

Search 305 Comments Log In/Create an Account

Comments Filter:

duh (Score:5, Funny)

by Gen. Malaise ( 530798 ) writes: on Sunday April 22, 2007 @10:12PM (#18836217) Journal

Nothing to see. High compression = slow and low compression = fast. umm duh?

Share
twitter facebook
- small = slow (Score:5, Funny)
  
  by Anonymous Coward writes: on Sunday April 22, 2007 @10:17PM (#18836249)
  
  So that's why smaller computers are slower, right?
  
  Parent Share
  twitter facebook
- Re: (Score:3, Insightful)
  
  by timeOday ( 582209 ) writes:
  
  So you alreay knew WinRK gave the best compression? I didn't; never even heard of it. My money would have been on bzip2.
  - Re: (Score:2, Funny)
    
    by dotgain ( 630123 ) writes:
    
    So you alreay knew WinRK gave the best compression? I didn't; never even heard of it.
    Well thank heavens we have now! If there's one area of computing I've always felt I wasn't getting enough variety, it's compression algorithms and the associated apps needed to operate with them.
    If there's one thing that brightens my day, is a client sending me a PDF compressed with "Hey-boss-I-fucked-your-wife-ZIP" right on deadline.
  - Re:duh (Score:5, Informative)
    
    by morcego ( 260031 ) writes: on Monday April 23, 2007 @12:03AM (#18836901)
    
    So you alreay knew WinRK gave the best compression? I didn't; never even heard of it. My money would have been on bzip2.
    
    I agree with you on the importance of this article but ... bzip2 ? C'mon.
    Yes, I know it is better than gzip, and it is also supported everywhere. But it is much worst than the "modern" compression algorithms.
    
    I have been using LZMA for some time now for things I need to store longer, and getting good results. It is not on the list, but should give results a little bit better than RAR. Too bad it is only fast when you have a lot of memory.
    
    For short/medium time storage, I use bzip2. Online compression, gzip (zlib), of course.
    
    Parent Share
    twitter facebook
    - Re:duh (Score:5, Informative)
      
      by timeOday ( 582209 ) writes: on Monday April 23, 2007 @01:07AM (#18837201)
      
      I agree with you on the importance of this article but ... bzip2 ? C'mon.
      Well, now I know.
      Here's a scatterplot [theknack.net] of resulting file sizes and compression times from the text compression data (lower is better), and as my luck would have it, bzip2 is really the only one that's out of line - i.e. the furthest from the pareto frontier [wikipedia.org]. But then, looking at the same data with file sizes plotted in the range of [0.0, 1.0] [theknack.net], it seems like there's a major case of diminishing returns for the expensive algorithms anyways. If you care at all about compression time, good ol' gzip is still a pretty decent choice!
      
      Parent Share
      twitter facebook
    - Re:duh (Score:4, Informative)
      
      by Compact Dick ( 518888 ) writes: on Monday April 23, 2007 @01:30AM (#18837349) Homepage
      
      LZMA ... is not on the list
      7-Zip [included in the test] is based on LZMA [wikipedia.org].
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by morcego ( 260031 ) writes:
        
        Yes, but there are many settable parameters for LZMA. So, using LZMA Utils [tukaani.org] will often give better results than 7-Zip.
    - Re:duh (Score:4, Informative)
      
      by OmnipotentEntity ( 702752 ) writes: on Monday April 23, 2007 @01:40AM (#18837399) Homepage
      
      7zip's default compression is LZMA, FYI.
      
      Parent Share
      twitter facebook
- Re:duh (Score:5, Funny)
  
  by setirw ( 854029 ) writes: on Sunday April 22, 2007 @10:18PM (#18836263) Homepage
  
  High compression = slow and low compression = fast
  
  You compressed the article into that statement. How long did it take to write the comment?
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by kabeer ( 695592 ) writes:
    
    Compressing the article into that statement would technically be classed as a lossy compression e.g. jpeg.
- Not really (Score:4, Insightful)
  
  by Toe, The ( 545098 ) writes: on Sunday April 22, 2007 @10:19PM (#18836265)
  
  Not every software achieves maximum efficiency. It is perfectly imaginable that a compressor could be slow and bad. It is nice to see that these compressors did not suffer that fate.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Informative)
  
  by aarusso ( 1091183 ) writes:
  
  Well, since the dawn of ages I saw ZIP v ARJ, bzip2 vs gzip.
  What's the point? Same programs compressing same data on a different computer.
  
  I use gzip for big files (takes less time)
  I use bzip2 for small files (compresses better)
  I use zip to send data to Windows people
  I really, really miss ARJ32. It was my favorite on DOS Days.
  - Re: (Score:2, Insightful)
    
    by cbreaker ( 561297 ) writes:
    
    Hell yea. Although ARJ had slightly better compression, it allowed for *gasp* two files in the archive to be named the same!
    
    Now a days it's all RAR for the Usenet and Torrents and such. RAR is really great but it's piss slow compressing anything. It's just so easy to make multipart archives with it.
    
    I really wish Stuffit would go away ..
    - Re:duh (Score:5, Interesting)
      
      by Firethorn ( 177587 ) writes: on Monday April 23, 2007 @12:37AM (#18837079) Homepage Journal
      
      Not only that, but you can sacrifice compression to create recovery capability in the case of lost/corrupted data, especially in the newer ones.
      
      Missing part 3 of 10? No problem!
      
      Of course, I'm a holder of a license for Rar from way back when. I like it.
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by Petrushka ( 815171 ) writes:
  
  I take it you didn't look at the "Compression Efficiency" graph at the bottom of each page.
  
  Of course they don't seem to reveal their methodology for calculating that graph, but even a glance at the other tables will show that, for example, Stuffit is almost always much faster saves very nearly as much space as 7-Zip (sometimes more). That's why comparisons like this are interesting.
- Re:duh (Score:5, Funny)
  
  by h2g2bob ( 948006 ) writes: on Sunday April 22, 2007 @11:19PM (#18836669) Homepage
  
  Oh, if only they'd compressed the article onto a single page!
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Funny)
    
    by MillionthMonkey ( 240664 ) writes:
    
    Server Error in '/' Application.
    Server Too Busy
    Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
    Exception Details: System.Web.HttpException: Server Too Busy
    Source Error: An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack t
- And slashdotting == no comression at all (Score:2)
  
  by EmbeddedJanitor ( 597831 ) writes:
  
  Server Too Busy
  Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
  Exception Details: System.Web.HttpException: Server Too Busy
- Re: (Score:3, Informative)
  
  by yppiz ( 574466 ) * writes:
  
  Another problem is that gzip has compression levels ranging from -1 (fast, minimal) to -9 (slow, maximal), and I suspect he only tested the default, which is either -6 or -7.
  
  I wouldn't be surprised if many of the other compression tools have similar options.
  
  --Pat
  - Re: (Score:3, Insightful)
    
    by AncientPC ( 951874 ) writes:
    
    They do test between different comparison levels. [techarp.com] The problem is they haven't posted any of the results yet which makes this article incomplete and useless.
- What are they actually measuring? (Score:3, Interesting)
  
  by Marcion ( 876801 ) writes:
  
  The article seems to be measuring the compression speed of each program with its native algorithm, it would have been better to do a set of programs with each algorithm first. As the article is comparing two variables at once, how good the algorithm is and how good the implementation in that program is, the results are slightly meaningless.
  
  Having said that, do I really care in practice that much about if algorithm A is 5% faster than algorithm B? I personally do not, I care if the person receiving them can
WOW! (Score:5, Funny)

by vertigoCiel ( 1070374 ) writes: on Sunday April 22, 2007 @10:15PM (#18836239)

I never would have guessed that there was a tradeoff between the quality and speed of compression! No way! Next they'll be saying things like 1080p HD offers quality at the expense of computational power required!

Share
twitter facebook
- Re: (Score:2)
  
  by seanadams.com ( 463190 ) * writes:
  
  I never would have guessed that there was a tradeoff between the quality and speed of compression! No way! Next they'll be saying things like 1080p HD offers quality at the expense of computational power required!
  
  If you really mean quality (as opposed to compression ratio) you've got it backwards. Lossless compression algorithms are generally simpler than lossy ones, especially on the encode side. Lossy algorithms have to do a lot of additional work converting signals to the frequency domain and applying c
Screw speed, size reduction: gimme compatibility (Score:5, Insightful)

by xxxJonBoyxxx ( 565205 ) writes: on Sunday April 22, 2007 @10:17PM (#18836243)

Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.

Share
twitter facebook
- Re:Screw speed, size reduction: gimme compatibilit (Score:5, Insightful)
  
  by Nogami_Saeko ( 466595 ) writes: on Sunday April 22, 2007 @10:24PM (#18836305)
  
  Nice comparison, but there's really only two that matter (at least on PCs):
  
  ZIP for cross-platform compatibility (and for simplicity for less technically-minded users).
  
  RAR for everything else (at 3rd in their "efficiency" list, it's easy to see why it's so popular, not to mention ease of use for splitting archives, etc).
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by BluhDeBluh ( 805090 ) writes:
    
    It's closed sourced and proprietary though. Someone needs to make an open-source RAR compressor - the problem is you can't use the official code to do that (as it's specifically in the licence), but you could use unrarlib [unrarlib.org] as a basis...
  - rar? (Score:2)
    
    by EmbeddedJanitor ( 597831 ) writes:
    
    wtf? How is this highly compatable? gzip has a much larger install based.
    - Re:rar? (Score:5, Funny)
      
      by Petrushka ( 815171 ) writes: on Monday April 23, 2007 @01:19AM (#18837279)
      
      I take it you come from a planet where very few people use Windows. Please, I'm curious to know, what are things like there?
      
      Parent Share
      twitter facebook
      - Here on Non-Windows planet (Score:4, Funny)
        
        by DrYak ( 748999 ) writes: on Monday April 23, 2007 @08:18AM (#18838871) Homepage
        
        Please, I'm curious to know, what are things like there?
        
        Things are less blue.
        (And I'm not speaking of the sky)
        
        Parent Share
        twitter facebook
  - UHA (Score:3, Insightful)
    
    by dj245 ( 732906 ) writes:
    
    Theres also another, rather uncommon format that wasn't tested that is somewhat important. UHARC- File extension UHA. It is dog slow, but offers better compression than probably any of the others. It is still used by software pirates with their custom install scripts, and I have seen it in official software install routines as well.
    
    You can keep Rar and zip and toss out the others, but the UHA extension (or a dummy extension) will probably exist on your computer at some point in time.
  - Re: (Score:3, Insightful)
    
    by Jeff DeMaagd ( 2015 ) writes:
    
    RAR irritates me though. It's rare enough that I usually have to dig up a decompresser for it and install it special for just one file and then I never use it again. I just don't like having to deal with files that require me to install new software just so I can use that one file. In that vein, I really don't think the article is relevant. I certainly won't use novelty file formats unless it looks like it has "legs". It's not like I want to make a file that becomes useless when the maintainer of the d
  - Re: (Score:2)
    
    by zippthorne ( 748122 ) writes:
    
    Why is "ease of splitting archives" considered to be important? You can do it with zip automatically, or any other archive format you care to choose by using, for instance, split -d -b 2048m filename, to split the output stream of any compressor into files no larger than 2 gig, with names starting with filename001.
    
    How many systems don't have any form of cat?
    - - Re: (Score:3, Informative)
        
        by Fweeky ( 41046 ) writes:
        
        RAR has recovery records (settable percentage of each archive dedicated to ECC, default off) and recovery volumes (dedicated files with PAR-like recovery capabilities). "Keep broken files" can be used to extract from broken or truncated archives.
- Depends on the application (Score:2)
  
  by Toe, The ( 545098 ) writes:
  
  Some people are sending huge graphics files and paying for badnwidth and/or sending to people with slow connectiuons, so they actually have a use for maximal compression.
  
  I have to agree that for most people (myself included), compatibility is all that matters. I'm so glad Macs now can natively zip. But there are valid reasons to want compression over compatibility.
- You might want an interface. (Score:2)
  
  by twitter ( 104583 ) writes:
  
  All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.
  
  Sure, but there's also the issue of finding the files you really want to share and there KDE has very nice front ends. There's a nice find in Konqueror, with switches for everything including click and drool regular expressions. Krename coppies or links files with excellent renaming. Finally, Konqueror has an archive button. The slick interf
- Re:Screw speed, size reduction: gimme compatibilit (Score:2, Interesting)
  
  by NMerriam ( 15122 ) writes:
  
  Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.
  
  I have to admit I switched over/back to ZIP about a year ago for everything for exactly this reason. yeah, it meant a lot of my old archives increased in size (sometimes by quite a bit), but knowing that anything anywhere can read the archive makes up for it. ZIP creation and decoding is supported natively by Mac a
- Re: (Score:2)
  
  by Deliveranc3 ( 629997 ) writes:
  
  With Quantum computing perhaps we'll start to see really elegant compression, like 2d checksums with bitshifting. If you can make all the data relate to each other than each bit of compressed file cuts the possibilities in half, get it down to maybe 1,000,000,000 possibilities and then tell it that it needs to be able to play in winamp and... well, use a lot of processing power.
  - How about (Score:5, Funny)
    
    by DrSkwid ( 118965 ) writes: on Monday April 23, 2007 @03:34AM (#18837877) Journal
    
    Give it am MD5 hash and a file length and it will compute all the possible files that could have produced the hash. Automatically filter our the invalid files and the set you're left with can't be that large.
    
    Parent Share
    twitter facebook
- Agreed completely. (Score:5, Interesting)
  
  by Kadin2048 ( 468275 ) writes: <slashdot...kadin@@@xoxy...net> on Monday April 23, 2007 @12:28AM (#18837027) Homepage Journal
  
  Back in the early/mid 90s I was pretty obsessed with data compression because I was always short on hard drive space (and short on money to buy new hard drives with); as a result I tended to compress things using whatever the format du jour was if it could get me an extra percentage point or two. Man, was that a mistake.
  
  Getting stuff out of some of those formats now is a real irritation. I haven't run into a case yet that's been totally impossible, but sometimes it's taken a while, or turned out to be a total waste of time once I've gotten the archive open.
  
  Now, I try to always put a copy of the decompressor for whatever format I use (generally just tar + gzip) onto the archive media, in source form. The entire source for gzip is under 1MB, trivial by today's standards, and if you really wanted to cut size and only put the source for deflate on there, it's only 32KB.
  
  It may sound tinfoil-hat, but you can't guarantee what the computer field is going to look like in a few decades. I had self-expanding archives, made using Compact Pro on a 68k Mac, thinking they'd make the files easy to recover later, which didn't help me at all now -- a modern (Intel) Mac won't touch it (although to be fair a PPC Mac will run OS 9 which will, and allegedly there's a Linux utility that will unpack CPP archives, although maybe not self-expanding ones).
  
  Given the rate at which bandwidth and storage space are expanding, I think the market for closed-source, proprietary data compression schemes should be very limited; there's really no good reason to use them for anything that you're storing for an unknown amount of time. You don't have to be a believer in the "infocalypse" to realize that operating systems and entire computing-machine architectures change over time, and what's ubiquitous today may be unheard of in a decade or more.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by maxume ( 22995 ) writes:
    
    7zip, and thus any format it supports, is as reliable as sourceforge. It's not a guarantee, but it isn't exactly 'you never know' territory either.
  - Don't forget not to go too far (Score:4, Funny)
    
    by gr8dude ( 832945 ) writes: on Monday April 23, 2007 @04:59AM (#18838165) Homepage
    
    This reminds me of... pkunzip.zip
    
    Parent Share
    twitter facebook
I keep it simple (Score:5, Funny)

by Anonymous Coward writes: on Sunday April 22, 2007 @10:17PM (#18836251)

I fill an old station wagon with backup tapes, and then put it in the crusher.

Share
twitter facebook
- Re:L-Zip (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  The L-Zip project at http://lzip.sourceforge.net/ [sourceforge.net] seems to be down right now but it should be included in any file compression comparison. It could reduce files to 0% of their original size and it was quick too.
  
  It was so good at what it did that I bet Microsoft bought them out and are going to incorperate the technology into Windows.
Skip the blogspam (Score:5, Informative)

by Anonymous Coward writes: on Sunday April 22, 2007 @10:19PM (#18836271)

as its slashdotted

this site
http://www.maximumcompression.com/ [maximumcompression.com]
has been up for years and performs tests on all the compressors with various input sources, much more comprehensive

Share
twitter facebook
- Re: (Score:2)
  
  by mwilliamson ( 672411 ) writes:
  
  CoralCDN, the poor man's slashdot effect countermeasure.
  http://www.techarp.com.nyud.net:8090/showarticle.a spx?artno=4&pgno=0 [nyud.net]
- Re: (Score:2)
  
  by xigxag ( 167441 ) writes:
  
  maximumcompression.com is an excellent site but it just compares compression ratio, not speed. Hence for some people, it's of limited use.
  
  And of course, there are other factors that these types of comparisons rarely mention or that are harder to quantify: Memory footprint, compression speed while multitasking, both foreground and backgound, single anad dual core, OS/gui integration, cross-platform availability, availability of source code, cost (particularly for enterprise users), backup options (how quie
  - Re: (Score:3, Insightful)
    
    by Spikeles ( 972972 ) writes:
    
    maximumcompression.com is an excellent site but it just compares compression ratio, not speed. Hence for some people, it's of limited use.
    See this page? http://www.maximumcompression.com/data/summary_mf. php [maximumcompression.com]
    What are the headers along the top? let's see..
    
    Pos, Program, Switches used, TAR, Compressed, Compression, Comp time, Decomp time, Efficiency
    
    OMG!.. is that a "time".. as in speed column i see there?
- Re: (Score:3, Interesting)
  
  by _|()|\| ( 159991 ) writes:
  After scanning MaximumCompression's results [maximumcompression.com] (sorted by compression time) the last time one of these data compression articles hit Slashdot, I gained a newfound appreciation for ZIP and gzip:
  
  they compress significantly better than any of the faster (and relatively obscure) programs
  the programs that compress significantly better take more than twice as long
  they're at the front of the pack for decompression time
  If you have a hard limit, like a single CD or DVD, then the extra time is worth it. Otherwis
- - Re: (Score:2)
    
    by Spikeles ( 972972 ) writes:
    
    Based on our results, we can only come to one conclusion. If you do not like to change the settings of your data compressors and want a good, fast and user-friendly data compressor, then WinZip is the best one for the job.
    Sounds to me like WinZip astroturfing.
How quick does it compress when slashdotted? (Score:2)

by syousef ( 465911 ) writes:

Bit hard to have a spoiler when the article isn't available.
What about LHA, TAR (Score:2, Insightful)

by Anonymous Coward writes:

These two formats are still widely used out there, and why are we compressing MP3's?
- Re:What about LHA, TAR (Score:5, Informative)
  
  by 644bd346996 ( 1012333 ) writes: on Sunday April 22, 2007 @10:42PM (#18836443)
  
  TAR is not a compressor.
  
  Parent Share
  twitter facebook
- Re:What about LHA, TAR (Score:4, Funny)
  
  by SirSlud ( 67381 ) writes: on Sunday April 22, 2007 @10:43PM (#18836455) Homepage
  
  TAR for compression? I woulda thought you were trolling if you didn't have LHA up there. Too bad you're anonymous, you'll never get to find out how unqualified you are for participating in this discussion.
  
  Parent Share
  twitter facebook
Interesting, needs better graphs (Score:5, Informative)

by MBCook ( 132727 ) writes: <foobarsoft@foobarsoft.com> on Sunday April 22, 2007 @10:23PM (#18836297) Homepage

I read this earlier today through the firehose. It was interesting, but the graphs are what struck me. It seems to me all the graphs should have been XY plots instead of pairs of histograms. That way you could easily see the relationship between compression ratio and time taken. Their "metric" for showing this, basically multiplying the two numbers, is pretty bogus and isn't nearly as easy to compare. With the XY plot the four corners are all very meaningful. One is slow with no compression, one each good compression/time, and the sweet spot of good compression and good time. It's easy to tell those on two opposing corners apart (good compression vs good time), where as with the article's metric they could look very similar.
Still, interesting to see. The popular formats are VERY well established at this point (ZIP in Windows and Mac (stuffit seems to be fading fast), and GZIP and BZIP2 on Linux). They are so common (especially with ZIP support built into Windows since XP and also built into OS X) I don't think we'll see them replaced any time soon. Of course, with CPU power getting cheaper and cheaper we are seeing formats that are more and compressed (MP3, H264, Divx, JPEG, etc) so these utilities are becoming less and less necessary. I no longer need to stuff files on floppies (I've got the net, DVD-Rs, and flash drives). Heck, if you look at some of the formats they "compressed" (at like 4% max) you almost might as well use TAR.

Share
twitter facebook
- Re: (Score:2)
  
  by TubeSteak ( 669689 ) writes:
  
  Heck, if you look at some of the formats they "compressed" (at like 4% max) you almost might as well use TAR.
  For high bandwidth websites, saving 4% means saving multiple GBs of traffic
  
  And I still zip up multiple files for sending over the internets.
- Re: (Score:2)
  
  by karnal ( 22275 ) writes:
  
  Of course, with CPU power getting cheaper and cheaper we are seeing formats that are more and compressed (MP3, H264, Divx, JPEG, etc)so these utilities are becoming less and less necessary.
  You do realize that you're talking about two different datasets whether you're talking something like .zip and then something like .mp3??? The more and more compressed options you spoke of only work well because they're for specific applications - and they're lossy to boot; the typical compression tools are lossless and for any data set.
  
  I don't think common compression libraries/utilities will ever fade, where there's a data set, there's always a need to get it just a little smaller....
- Re: (Score:3, Informative)
  
  by timeOday ( 582209 ) writes:
  
  It was interesting, but the graphs are what struck me. It seems to me all the graphs should have been XY plots instead of pairs of histograms.
  Yup. [theknack.net].
  - Re: (Score:2)
    
    by Petrushka ( 815171 ) writes:
    
    That's very interesting. Looks like WinRAR is sitting in a pretty sweet spot in that hyperbola.
no best compression results? (Score:2)

by Uksi ( 68751 ) writes:

You have gotta be kidding me, article is posted and there are no best compression test results! Lame!
Poor article. (Score:5, Insightful)

by FellowConspirator ( 882908 ) writes: on Sunday April 22, 2007 @10:24PM (#18836307)

This is a poor article on several points. First, the entropy of the data in the files isn't quantified. Second, the strategy used for compression isn't described at all. If WinRK compresses so well on very high entropy data, there must be some filetype specific strategies used.

Versions of the programs aren't given, nor the compile-time options (for the open source ones).

Finally, Windows Vista isn't a suitable platform for conducting the tests. Most of these tools target WinXP in their current versions and changes to Vista introduced systematic differences in very basic things like memory usage, file I/O properties, etc.

The idea of the article is fine, it's just that the analysis is half-baked.

Share
twitter facebook
- Re:Poor article. (Score:5, Insightful)
  
  by RedWizzard ( 192002 ) writes: on Sunday April 22, 2007 @11:10PM (#18836615)
  
  I've got some more issues with the article. They didn't test filesystem compression. This would have been interesting to me because often the choice I make is not between different archivers, but between using an archiver or just compressing the directory with NTFS' native compression.
  They also focused on compression rate when I believe they should have focused on decompression rate. I'll probably only archive something once, but I may read from the archive dozens of times. What matters to me is the trade-off between space saved and extra time taken to read the data, not the one-off cost of compressing it.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by fireboy1919 ( 257783 ) writes:
    
    They didn't test filesystem compression.
    
    No, they didn't. They really should have tested that. Personally, I like 7-zip's compressed filesystem better than WinZip's, but I haven't really tried any of the others.
    
    Hold on...I've just been handed a note. Apparently you don't get to make any real choices in that area - it's zip or nothing. Further, the details of compressing and decompressing is handled whenever the filesystem feels like it, so it can't really be judged against traditional programs. So I gu
What's the point of compressing JPEG,MP3,DivX etc (Score:5, Insightful)

by mochan_s ( 536939 ) writes: on Sunday April 22, 2007 @10:26PM (#18836315)

What's the point of compressing JPEG,MP3,DivX etc since they already do the compression? The streams are close to random (with max information) and all you could compress would be the headers between blocks in movies or the ID3 tag in MP3.

Share
twitter facebook
- Re:What's the point of compressing JPEG,MP3,DivX e (Score:2)
  
  by Lehk228 ( 705449 ) writes:
  
  because then they can use those graphs to pump their sponsor (WinRK)
- Re:What's the point of compressing JPEG,MP3,DivX e (Score:5, Interesting)
  
  by trytoguess ( 875793 ) writes: on Sunday April 22, 2007 @11:25PM (#18836701)
  
  Er... did ya check out the comparisons? As you can see here here [techarp.com] jpeg at least can be compressed considerably with Stuffit. According to this [maximumcompression.com] the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." I've no idea what that means, but it does seem to be more thorough and complex than what you wrote.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ampathee ( 682788 ) writes:
    
    Mod parent up! I noticed that too, very interesting - I wonder whether a jpg compressed as efficiently as the JPEG standard allows could still be improved upon by StuffIt, or whether it just takes advantage of the inefficiency of most jpg compression code..
  - Re: (Score:3, Insightful)
    
    by slim ( 1652 ) writes:
    
    "the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding."
    Whew, that makes me feel a bit dirty: detecting a file format an applying special rules. It's a bit like firewalls stepping out of their network-layer remit to mess about with application-layer protocols (e.g. to make FTP work over NAT).
    
    Still, in both cases, it works; who can argue with that.
  - Re:What's the point of compressing JPEG,MP3,DivX e (Score:5, Interesting)
    
    by kyz ( 225372 ) writes: on Monday April 23, 2007 @06:36AM (#18838493) Homepage
    
    While the main thrust of JPEG is to do "lossy" compression, the final stage of creating a JPEG is to do lossless compression on the data. There are two different official methods you can use: Huffman Coding and Arithmetic Coding.
    
    Both methods do the same thing: they statistically analyse all the data, then re-encode it so the most common values are encoded in a smaller way than the least common values.
    
    Huffman's main limitation is that each value compressed needs to consume at least one bit. Arithmetic coding can fit several values into a single bit. Thus, arithmetic coding is always better than Huffman, as it goes beyond Huffman's self-imposed barrier.
    
    However, Huffman is NOT patented, while most forms of arithmetic coding, including the one used in the JPEG standard, ARE patented. The authors of Stuffit did nothing special - they just paid the patent fee. Now they just unpack the Huffman-encoded JPEG data and re-encode it with arithmetic coding. If you take some JPEGs that are already compressed with arithmetic coding, Stuffit can do nothing to make them better. But 99.9% of JPEGs are Huffman coded, because it would be extortionately expensive for, say, a digital camera manufacturer, to get a JPEG arithmetic coding patent license.
    
    So Stuffit doesn't have remarkable code, they just paid money to get better compression that 99.9% of people specifically avoid because they don't think it's worth the money.
    
    Parent Share
    twitter facebook
    - - Re: (Score:3, Informative)
        
        by swilver ( 617741 ) writes:
        
        Read up on arithmic encoding. Basically it works by creating a huge floating point number. For example, let's say you want to encode a stream like this: "ABBBBABBBB". Statistically, A has a 20% chance of occuring, while B has a 80% chance of occuring. With Huffman you could encode this (obviously) as 0111101111, which takes 10 bits. Huffman encoding being limited to bits has no way to take advantage of the fact that the "B" occurs 80% more often than the "A".
        With Arithmic encoding however you'd encode
- Re:What's the point of compressing JPEG,MP3,DivX e (Score:4, Insightful)
  
  by athakur999 ( 44340 ) writes: on Sunday April 22, 2007 @11:53PM (#18836831) Journal
  
  Even it the amount of additional compression is insignificant, ZIP, RAR, etc. are still very useful as container formats for MP3, JPG, etc. files since it's easier to distribute 1 or 2 .ZIP files than it is 1000 individual .JPG files. And if you're going to package up a bunch of files into a single file for distribution, why not use the opportunity to save a few kilobytes here and there if it doesn't require much more time to do that?
  
  Parent Share
  twitter facebook
- Default options and stuffit (Score:3, Informative)
  
  by Rosyna ( 80334 ) writes:
  
  By default, Stuffit won't even bother to compress MP3 files. That's what it shows an increase in file size (for the archive headers) and why it is the fastest throughput (it's not trying to compress). If you change the option, the results will be different.
  
  I imagine some other codecs also have similar options for specific file types.
Hmm... (Score:2)

by neonstz ( 79215 ) writes:

They didn't think their cunning plan to create more ad revenue by creating a shitload of pages all the way through...
english language is mostly fluff (Score:4, Funny)

by Blue Shifted ( 1078715 ) writes: on Sunday April 22, 2007 @10:27PM (#18836327) Journal

the most interesting thing about text compression is that there is only about 20% information in the english language (or less). yes, that means that 4/5ths of it is meaningless filler. filled up with repetitive patterns. as you can see, i really didn't need four sentences to tell you that, either.

i wonder how other languages compare, and if there is a way to communicate much more efficiently.

Share
twitter facebook
- Re: (Score:2)
  
  by demonlapin ( 527802 ) writes:
  
  Yes, you can communicate much more efficiently. Much of the length of English words is related to defining part of speech - we use "-ing" for adjectival forms of verbs, "-ly" for adverbs, etc. It's just an attribute that is expressed by a clearly recognizable pattern. As such, it's easily comprehended by the reader who can identify parts of words rather than letter-by-letter reading. This is the essence of true speed-reading.
  An anecdotal observation: my wife is better than I am at linguistic tasks, b
- Pizzachish: setting a new standard in languages (Score:2, Interesting)
  
  by pizzach ( 1011925 ) writes:
  
  I have been thinking about creating a new language with about 60 or so words. The idea is that you don't need a lot of words when you can figure out the meaning by context. Strong points are that the language would be very easy to pick up, and you would get that invigorating feeling of talking like a primitive cave man.
  
  As an example of the concept, we have the words walk and run. They are a bit too similar to be worth wasting one of our precious few 60 words. Effectively, one could be dropped with have
  - Re: (Score:2)
    
    by Kandenshi ( 832555 ) writes:
    
    If you're going to do that, I'd suggest considering the way some languages deal with modified versions of words like warm and hot.
    
    "warm warm" = "hot"
    "walk walk" = "walk fast/run"
    
    It could greatly reduce the number of adjectives and verbs(and other stuff) you need in the language.
    - Re: (Score:2)
      
      by TheLink ( 130905 ) writes:
      
      yeah, those double plus ungood languages ;).
  - Re: (Score:3, Interesting)
    
    by wall0159 ( 881759 ) writes:
    
    you might be interested in this:
    
    http://www.tokipona.org/ [tokipona.org]
  - Re: (Score:2)
    
    by dodobh ( 65811 ) writes:
    
    That is doubleplusinteresting..
- - Re: (Score:3, Funny)
    
    by Sketch ( 2817 ) writes:
    
    Your response is 50% larger than necessary
7zip (Score:5, Insightful)

by Lehk228 ( 705449 ) writes: on Sunday April 22, 2007 @10:33PM (#18836357) Journal

7-zip cribsheet:

weak on retarded things to zip like WAV files (use FLAC) mp3's, jpegs and divx movies.

7zip does quite well in documents (2nd) and ebooks (2nd) 3rd on MPEG video, 2nd in PSD

also i expect 7zip will improve in higher end compressions settings, when possible i give it hundreds of megs and unlike commercial apps 7zip can be configured well into the "insane" range

Share
twitter facebook
Doesn't really matter (Score:3, Informative)

by 644bd346996 ( 1012333 ) writes: on Sunday April 22, 2007 @10:35PM (#18836393)

These days, file compression is pretty much only used for large downloads. In those instances, you really have to use either gzip, pkzip, or bzip2 format, so that your users can extract the file.

Yes, having a good compression algorithm is nice, but unless you can get it to partially supplant zip, you'll never make much money off it. Also, most things these days don't need to be compressed. Video and audio are already encoded with lossy compression, web pages are so full of crap that compressing them is pointless, and hard drives are big enough. Although, I haven't seen any research lately about whether compression is useful for entire filesystems to reduce the bottleneck from hard drives. Still, I suspect that it is not worth the effort.

Share
twitter facebook
- Backups (Score:2)
  
  by Craig Ringer ( 302899 ) writes:
  
  File compression is also very important for backups, both for capacity and backup/restore speed. But you know what? In backups, you want to ensure that the archives are going to be recognisable and readable by as wide a variety of software as possible, so your disaster recovery options are open. Sure, you probably encrypt them, but there portable and fairly standard tools are also a good idea rather than some compression&archival app's built-in half-baked password protection.
  
  As for compressing whole fil
Coralized and Hutter Prized (Score:2)

by Baldrson ( 78598 ) * writes:

A coralized link for the slashdotted [nyud.net].
Meanwhile, I noticed they didn't include the latest winner of the Hutter Prize [slashdot.org], which is unfortunate since its latest entry looks like it will come in at nearly a 10% improvement over all prior text compressors using novel semantic modeling techniques [google.com].
Mirrors! (Score:2)

by antdude ( 79039 ) writes:

Looks like the server was /.'ed. Mirrors: MirrorDot [mirrordot.org] and Network Mirror [networkmirror.com].
Archive Comparison Test (Score:5, Insightful)

by Repton ( 60818 ) writes: on Sunday April 22, 2007 @11:09PM (#18836609) Homepage

See also: the Archive Comparison Test [compression.ca]. Covers 162 different archivers over a bunch of different file types.

It hasn't been updated in a while (5 years), but have the algorithms in popular use changed much? I remember caring about compression algorithms when I was downloading stuff from BBSs at 2400 baud, or trading software with friends on 3.5" floppies. But in these days of broadband, cheap writable CDs, and USB storage, does anyone care about squeezing the last few bytes out of an archive? zip/gzip/bzip2 are good enough for most people for most uses.

Share
twitter facebook
Exhaustive?! (Score:5, Informative)

by jagilbertvt ( 447707 ) writes: on Sunday April 22, 2007 @11:12PM (#18836625)

It seems odd that they didn't include executables/dlls in the comparison (where maxmumcompression.com does). I also find it odd that they are compressing items that normally don't compress very well with most data compression programs (divx/mpegs/jpegs/etc). I'm guessing this is why 7-zip ranked a bit lower than most.

I did some comparison last year, and found 7-zip to do the best job for what I needed (great compression ratio without requiring days to complete). It also doesn't take into account the network speed at which the file is going to be transmitted. I use 7-zipfor pushing application updates and such to remote offices (most over 384k/768k WAN links). Compressing w/ 7-zip has saved users quite a bit of time compared to winrar or winzip.

I would definitely recommend checking out maximumcompression.com (As others have, as well) over this article. It goes into a lot greater detail.

Share
twitter facebook
poor sample data choices (Score:2, Redundant)

by SideshowBob ( 82333 ) writes:

It's a waste of time using a general purpose compressor on data that's already been compressed by domain specific audio or video compressors.
It's actually pretty sensible (Score:2)

by DoofusOfDeath ( 636671 ) writes:

Exhaustive Data Compressor Comparison

It makes a lot of sense, considering how my eyelids feel after reading what the article is about.
Didn't have Tridge's rzip... (Score:3, Interesting)

by agristin ( 750854 ) writes: on Monday April 23, 2007 @01:06AM (#18837191) Journal

Andrew Tridgell's rzip wasn't on there either.

http://samba.org/junkcode/ [samba.org]

Tridge is one of the smart guys behind samba. And rzip is pretty clever for certain things. Just ask google.

Share
twitter facebook
how about non-windows platforms anyone? (Score:4, Insightful)

by sofar ( 317980 ) writes: on Monday April 23, 2007 @02:18AM (#18837555) Homepage

The article conveniently forgets to mention whether the conpression tools are cross-platform (OSX, Linux, BSD) and/or open source or not.

That makes a lot of them utterly useless for lots of people. Yet another windows-focussed review, bah.

Share
twitter facebook
There Are Only A Few Really Useful Algorithms (Score:3, Informative)

by MCTFB ( 863774 ) writes: on Monday April 23, 2007 @03:36AM (#18837879)

for general purpose lossless compression. Most modern compression utilities out there mix and match the same algorithms which do the same thing.

With the exception of compressors that use arithmetic coding (which has patents out the wazoo covering just about every form of it), virtually all compressors use some form of Huffman compression. In addition, many use some form of LZW compression before executing the Huffman compression. That is pretty much it for general purpose compression.

Of course, if you know the nature of the data you are compressing you can come up with a much better compression scheme.

For instance, with XML, if you have a schema handy, you can do some really heavy optimization since the receiving side of the data probably already has the schema handy which means you don't need to bother sending some sort of compression table for the tags, attributes, element names, etc.

Likewise, with FAX machines, run length encoding is used heavily because of all the sequential white space that is indicative of most fax documents. Run length encoding of white space can also be useful in XML documents that are pretty printed.

Most compression algorithms that are very expensive to compress are usually pretty cheap to decompress. If you are providing a file for millions of people to download, it doesn't matter if it takes 5 days to compress the file if it still only takes 30 seconds for a user to decompress it. However, when doing peer to peer communication with rapidly generated data, you need the compression to be fast if you use any at all.

Nevertheless, most generaly purpose lossless compression formats are more or less clones of each other once you get down to analyzing what algorithms they use and how they are used.

Share
twitter facebook
- Re:There Are Only A Few Really Useful Algorithms (Score:4, Informative)
  
  by kyz ( 225372 ) writes: on Monday April 23, 2007 @06:55AM (#18838591) Homepage
  
  Wow, are you speaking beyond your ken. When you say "some form of LZW compression", you should have said "some form of LZ compression" - either Lempel and Ziv's 1977 (sliding window) or 1978 (dictionary slots) papers on data compression by encoding matched literal strings. LZW is "some form" of LZ78 compression which, apart from GIFs, almost nobody uses. It's too fast and not compressy enough. Most things use LZH (LZ77 + Huffman), specifically DEFLATE, the kind used in PKZIP, firstly because the ZIP file format is still very popular, and because zlib is a very popular free library that can be embedded into anything.
  
  Fax machines use a static Huffman encoding. They've never used run-length encoding. Run-length encoding is nothing compared to how efficiently LZ77 or LZ78 would handle pretty-printed XML.
  
  Compression algorithms vary on both their compression and decompression speed. LZ77 is slow to compress and fast to decompress. Arithmetic coding and PPM are slow both compressing and decompressing.
  
  Parent Share
  twitter facebook
SMP hardware? (Score:3, Insightful)

by MrNemesis ( 587188 ) writes: on Monday April 23, 2007 @06:16AM (#18838415) Homepage Journal

I only skimmed the article but what with all the hullabaloo about dual/quad core chips, why didn't they use "exhaustive" as an excuse to check out the parallelisability (if that's a word) of each compression algorithm? IIRC they didn't list the hardware they used or any of the switches they used, which is a glaring omission in my book.

Of all the main compression utils I use, 7-zip, RAR and bzip2 (in the form of pbzip2) all have modes that will utilise multiple chips, often giving a pretty huge speedup in compression times. I'm not aware of any SMP branches for gzip/zlib but seeing as it appears to be the most efficient compressor by miles it might not even need it ;)

It's mainly academic for me now though anyway, since almost all of the compression I use is inline anyway, either through rsync or SSH (or both). Not sure if any inline compressors are using LZMA yet, but the only time I find myself making an archive is for emailing someone with file size limits on their mail server. All of the stuff I have at home is stored uncompressed because a) 90% of it is already highly compressed and b) I'd rather buy slightly bigger hard drives that attempt to recover a corrupted archive a year or so down the line. Mostly I'm just concerned about decompression time these days.

Share
twitter facebook
- Re:/. effect rears its ugly head once again! (Score:5, Funny)
  
  by killa62 ( 828317 ) writes: on Sunday April 22, 2007 @10:23PM (#18836293)
  
  yep, looks like they're using WinRK on the fly to decompress the website from storage
  
  Parent Share
  twitter facebook
- - Re: (Score:2)
    
    by Lehk228 ( 705449 ) writes:
    
    http://p7zip.sourceforge.net/ [sourceforge.net] 7zip does as well
  - Re: (Score:2)
    
    by metamatic ( 202216 ) writes:
    
    Only gzip, bzip2, and Stuffit run multi-platform, although other programs to uncompress most of the file types used are available on most platforms.
    
    That's a bit misleading. For example, PKzip may not be multi-platform, but there are good native Zip compression and decompression programs available for every major platform.
- Re: (Score:2, Informative)
  
  by moronoxyd ( 1000371 ) writes:
  
  > UM, yeah, the dataset includes WAV files. Try flac [sourceforge.net].
  > Then you will have exhausted a little more of the compression programs available.
  
  You are aware that all the tools tested are general purpose compressors, and FLAC is not, aren't you?
  
  Otherwise, you would also have to talk about Wavepack, Monkey Audio, Shorten and others.
  And those are only the loseless audio codecs. What about lossy codecs?
  
  What about all those different formats for pictures? They compress data as well.
  And what abou
- Re:This is nothing new (Score:4, Funny)
  
  by Starburnt ( 860851 ) writes: on Sunday April 22, 2007 @11:29PM (#18836711)
  
  So they've compressed it to 11. I'd say that's a step forward.
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

duh (Score:5, Funny)

small = slow (Score:5, Funny)

Re: (Score:3, Insightful)

Re: (Score:2, Funny)

Re:duh (Score:5, Informative)

Re:duh (Score:5, Informative)

Re:duh (Score:4, Informative)

Re: (Score:2)

Re:duh (Score:4, Informative)

Re:duh (Score:5, Funny)

Re: (Score:2, Insightful)

Not really (Score:4, Insightful)

Re: (Score:3, Informative)

Re: (Score:2, Insightful)

Re:duh (Score:5, Interesting)

Re: (Score:2)

Re:duh (Score:5, Funny)

Re: (Score:3, Funny)

And slashdotting == no comression at all (Score:2)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

What are they actually measuring? (Score:3, Interesting)

WOW! (Score:5, Funny)

Re: (Score:2)

Screw speed, size reduction: gimme compatibility (Score:5, Insightful)

Re:Screw speed, size reduction: gimme compatibilit (Score:5, Insightful)

Re: (Score:3, Informative)

rar? (Score:2)

Re:rar? (Score:5, Funny)

Here on Non-Windows planet (Score:4, Funny)

UHA (Score:3, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:3, Informative)

Depends on the application (Score:2)

You might want an interface. (Score:2)

Re:Screw speed, size reduction: gimme compatibilit (Score:2, Interesting)

Re: (Score:2)

How about (Score:5, Funny)

Agreed completely. (Score:5, Interesting)

Re: (Score:2, Insightful)

Don't forget not to go too far (Score:4, Funny)

I keep it simple (Score:5, Funny)

Re:L-Zip (Score:2, Funny)

Skip the blogspam (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:2)

How quick does it compress when slashdotted? (Score:2)

What about LHA, TAR (Score:2, Insightful)

Re:What about LHA, TAR (Score:5, Informative)

Re:What about LHA, TAR (Score:4, Funny)

Interesting, needs better graphs (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

no best compression results? (Score:2)

Poor article. (Score:5, Insightful)

Re:Poor article. (Score:5, Insightful)

Re: (Score:2)

What's the point of compressing JPEG,MP3,DivX etc (Score:5, Insightful)

Re:What's the point of compressing JPEG,MP3,DivX e (Score:2)

Re:What's the point of compressing JPEG,MP3,DivX e (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:What's the point of compressing JPEG,MP3,DivX e (Score:5, Interesting)

Re: (Score:3, Informative)

Re:What's the point of compressing JPEG,MP3,DivX e (Score:4, Insightful)

Default options and stuffit (Score:3, Informative)

Hmm... (Score:2)

english language is mostly fluff (Score:4, Funny)

Re: (Score:2)

Pizzachish: setting a new standard in languages (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)