Slashdot Log In
Compress Wikipedia and Win AI Prize
Posted by
CmdrTaco
on Sun Aug 13, 2006 05:50 PM
from the what-does-this-mean dept.
from the what-does-this-mean dept.
Baldrson writes "If you think you can compress a 100M sample of Wikipedia better than paq8f, then you might want to try winning win some of a (at present) 50,000 Euro purse. Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence. The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids. Matt Mahoney provides a writeup of the rationale for the prize including a description of the equivalence of compression and general intelligence."
Related Stories
[+]
Science: Text Compressor 1% Away From AI Threshold 442 comments
Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading ... Please wait.

WikiPedia on iPod! (Score:2, Interesting)
I'd love to be able to have the whole WikiPedia available on my iPod (or cell phone), but without destroying [sourceforge.net]
info.edu.org [edu.org] - Speedy information and news from the Top 10 educational organisations.
Re:WikiPedia on iPod! (Score:3, Insightful)
wikicast (Score:4, Funny)
So, we need a WikiCast - remember folks, you heard it here first!
Re:WikiPedia on iPod! (Score:4, Insightful)
Re:WikiPedia on iPod! (Score:4, Funny)
But captain (Score:5, Funny)
But captain, if we reverse the tachyon inverter drives then we will have insufficient dilithium crystals to traverse the neutrino warp.
Re:But captain (Score:5, Funny)
Painful to read (Score:4, Insightful)
Re:Painful to read (Score:2, Insightful)
Yeah, I just read the write-up twice and have no idea if this is an AI contest, something to do with compression, or what. In fact, all I can remember now is the word "incentivize" which is the sort of thing I expect som
Re:Painful to read (Score:3, Insightful)
No, they need to verbize another noun when there was a perfectly good word in the language that means *exactly* what they want. feh.
Re:Not sure if that's a joke. (Score:4, Funny)
Re:Painful to read (Score:5, Funny)
He did, but Slashdot's AI compressed it for him.
:-D
lossy compression (Score:5, Insightful)
Re:lossy compression (Score:3, Insightful)
The human brain is a fuzzy clustering algorithm, that's what neural networks do, they reduce the space of a large
data set by eliminating redundancy
Re:lossy compression (Score:3, Insightful)
Lossy relational/dictionary based compression is the base. You hardly ever remember text by order of letters or sound of voice reading i
Re:lossy compression (Score:5, Insightful)
You just need to re-create afile that matches the md5sum and still follows the rules of a Linux kernel. It is extremely unlikely any other file that can be recognized as some kind of Linux kernel and matches. Of course there are countless blocks of data that still match, but very few will follow the ruleset of "ELF kernel executable" structure which can be deduced numerically.
Mmmm, no. You were fine up until you said "very few will follow the ruleset". That's not true. To see that it's not true, take your kernel, which consists of around 10 million bits. Now find, say, 512 of those bits that can be changed, independently, while still producing a valid-looking kernel executable. The result doesn't even have to be a valid, runnable kernel, but it wouldn't be too hard to do it even with that restriction.
So you now have 2^512 variants of the Linux kernel, all of which look like a valid kernel. But there are only 2^128 possible hashes, so, on average, there will be four kernels for each hash value, and the odds are very, very good that your "real" kernel's hash is also matched by at least one of them. If by some chance it isn't, I can always generate a whole bunch more kernel variants. How about 2^2^10 of them?
A hash plus a filter ruleset does not constitute a lossless compression of a large file, even if computation power/time is unbounded.
As long as it is Wiki that we are talking about... (Score:4, Funny)
There. All of wiki, in 31 bytes.
Who'da thunk... (Score:5, Funny)
Re:Who'da thunk... (Score:2)
Easy! (Score:2, Funny)
Lossy Compression? (Score:4, Funny)
Re:Lossy Compression? (Score:2, Insightful)
Re:Lossy Compression? (Score:2, Interesting)
Would be useful for images (Score:5, Funny)
Comparison (Score:2, Informative)
Re:Comparison (Score:2)
Re:Comparison (Score:2)
Re:Comparison (Score:3, Funny)
Have no fear though, I'm working on a new one.
It's a big world out there (Score:5, Interesting)
This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.
TFA is a neat idea theoreretically, but it's progeny will never be able to leave the lab.
--
I figured out how to get a second 120-byte sig! Mod me up and I'll tell you how you can have one too.
Re:It's a big world out there (Score:2)
This is precisely the assumption of Hutter's theory.
Chapter 2 of his book "Simplicity & Uncertainty" deal
Re:It's a big world out there (Score:2)
There is no allowance for lossy compression. The requirement of lossle
Re:It's a big world out there (Score:5, Funny)
Your use of "TFA" is a good compressional technique, but you could change "it's" to "its" and actually GAIN in meaning while losing a character! You're well on your way...
Re:It's a big world out there (Score:5, Informative)
In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.
Since even an optimal decision maker has this undesirable trait, I don't think the existence of this trait is a good criteria for rejecting decision making models.
Re:It's a big world out there (Score:3, Informative)
Re:It's a big world out there (Score:3, Insightful)
Re:It's a big world out there (Score:4, Interesting)
Just to help (and so you don't think I made Turbo Codes up -- it's sounds like I did 'cause it's such a bad name)
http://en.wikipedia.org/wiki/Turbo_code [wikipedia.org]
Er, I'm not so sure about this. (Score:4, Interesting)
It really seems like one of those mistaking-the-map-for-the-territory errors.
-b
Solution. (Score:5, Funny)
Then just apply your personal favourite compression utility.
I like lharc, which according to Wikipedia was invented in 1904 as a result of bombarding President Lincoln, who plays Commander Tucker in Star Trek: Enterprise with neutrinos.
Incentivize? (Score:5, Funny)
Sorry, anything which uses the word "incentivize" does not involve intelligence, natural or artificial.
I'll try: (Score:5, Funny)
Mine wins as it is roughly 40 bytes total.To get your results, you simply need to run the self-extracting archive, and wait. Be warned, it will take a while, but that is the cost of such a great compression scheme!
Re:I'll try: (Score:4, Funny)
Here's one that's even shorter, but you have to type in the decryption key exactly right.
C++ (Score:3, Funny)
A (good) sign of the times, I guess.
Hutter's Theory - Disproved (Score:4, Insightful)
Human poker players address this issue by deliberately introducing slight randomness into their play. I think a "Hutter AI" will make better real-world decisions if it does the same (see Game Theory).
Occam's razor is also highly suspect. There's the issue of cultural bias when counting assumptions. And all programmers will be aware of how they fixed "the bug" that caused all the problems in an application, only to find there were other bugs that caused identical symptoms.
Compress Wikipedia and win a prize? (Score:5, Funny)
Re:for those who rtfa (Score:2, Informative)
18MB
b) how many bytes was wikipedia before it was compressed
A sample of 100MB
Your goal:
.
KFG
Re:Can it be "lossy" compression? (Score:5, Funny)
Wrong contest (Score:4, Informative)
The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.
Barebones Windows or Linux (Score:3, Informative)
Re:Can it be "lossy" compression? (Score:3, Informative)
This inconsistency doesn't have any effect on the challenge, though -- that 50kEU
Re:Can it be "lossy" compression? (Score:5, Funny)