Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Building a Fast Wikipedia Offline Reader

Posted by kdawson on Mon Aug 13, 2007 09:53 PM
from the you-could-look-it-up dept.
ttsiod writes "An internet connection is not always at hand. I wanted to install Wikipedia on my laptop to be able to carry it along with me on business trips. After trying and rejecting the normal (MySQL-based) procedure, I quickly hacked a much better one over the weekend, using open source tools. Highlights: (1) Very fast searching. (2) Keyword (actually, title words) based searching. (3) Search produces multiple possible articles, sorted by probability (you choose amongst them). (4) LaTeX based rendering for mathematical equations. (5) Hard disk usage is minimal: space for the original .bz2 file plus the index built through Xapian. (6) Orders of magnitude faster to install (a matter of hours) compared to loading the 'dump' into MySQL — which, if you want to enable keyword searching, takes days."

Related Stories

This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Wow! (Score:3, Funny)

    by ferrocene (203243) on Monday August 13, @09:56PM (#20220565)
    (http://www.ferrobyte.com/ | Last Journal: Monday October 20 2003, @12:20AM)
    After doing all that, I think you may have missed your flight! :)
    • Re:Wow! by ivlianvs (Score:1) Tuesday August 14, @03:38AM
    • Re:Why? by bn557 (Score:3) Monday August 13, @10:16PM
    • Re:Why? (Score:5, Funny)

      by rabblerabble (884373) on Monday August 13, @10:20PM (#20220707)
      I'll bite...Unfortunately, I don't have a basement, so therefore there are times that I am required to venture into the outer realm that happens to be heated by the big ball of gas known as Sol, as opposed to a pump ;P Seriously though, this is exactly what I have been looking for. What better way to show up your friends when they cry "You're wrong, google it!" knowing that there is no connection possible within twenty miles. Next time i'm drunk at the beach and someone wants to pretend to know the history of coffee harvesting, it's on.
      [ Parent ]
    • by ampathee (682788) on Monday August 13, @10:22PM (#20220719)

      Programmers shouldn't be wasting time on these trivial, pointless projects. We need their work in other more important projects!
      Hah! I'm going to start work on (let's see..) a random lolcat generator now, just to piss you off.
      [ Parent ]
    • Re:Why? by Short Circuit (Score:1) Monday August 13, @10:25PM
      • 1 reply beneath your current threshold.
    • Re:Why? by RuBLed (Score:2) Monday August 13, @10:36PM
    • Re:Why? (Score:5, Insightful)

      by thePsychologist (1062886) on Tuesday August 14, @12:05AM (#20221433)
      (Last Journal: Friday September 14, @02:08PM)
      Realize that some of the greatest things done by humankind were from doing "pointless projects" as you call them. Prime numbers for instance were studied by mathematicians just for fun, and now look, they're used for cryptography. Try doing your banking without them.

      Complex numbers originated from something "useless" like trying to solve the quartic polynomial in radicals...try building a bridge without them. In fact all of science is built upon people going in random tangents doing things they enjoy, discovering seemingly "useless facts" but most of it becomes useful *and* gives us an idea of the universe in which we live.

      Only working on immediate practical problems is very shortsighted, and if mandated throughout the academic community, would mean the death of innovation and most discoveries.
      [ Parent ]
      • Re:Why? by jshowlett (Score:1) Tuesday August 14, @08:57AM
      • 1 reply beneath your current threshold.
    • Re:Wow! by nschubach (Score:1) Tuesday August 14, @07:36AM
    • Re:Why? by macro187 (Score:1) Tuesday August 14, @09:17AM
    • Re:Why? by Tweekster (Score:2) Tuesday August 14, @11:29AM
    • 4 replies beneath your current threshold.
  • Wow (Score:1)

    Great Job, that is the power of open source for ya.

    Now we need to work on porting that to over OS's and we will be set.
    • Re:Wow by Dusty101 (Score:1) Tuesday August 14, @03:55AM
      • 1 reply beneath your current threshold.
    • 1 reply beneath your current threshold.
  • Ho-Hum ... (Score:5, Funny)

    by jabberwock (10206) on Monday August 13, @10:21PM (#20220715)
    (http://www.tftb.com/)
    What, no auto update? No User Agreement? No disabled features that are enabled by a mammoth key? No product registration?


    Let us know when you're ready for prime time ... ;-)

    • Re:Ho-Hum ... (Score:5, Insightful)

      by OzRoy (602691) on Tuesday August 14, @12:18AM (#20221501)
      Auto-update would be interesting. How do you keep the data up to date without downloading the entire 2.9G again? Is there some sort of diff file you can download?
      [ Parent ]
      • Re:Ho-Hum ... by Anonymous Coward (Score:1) Tuesday August 14, @02:55AM
      • Re:Ho-Hum ... by MichaelSmith (Score:3) Tuesday August 14, @06:04AM
      • Re:Ho-Hum ... by gwern (Score:1) Tuesday August 14, @08:42AM
      • Re:Ho-Hum ... by superpulpsicle (Score:2) Tuesday August 14, @12:59PM
      • 2 replies beneath your current threshold.
    • Re:Ho-Hum ... by MikkoApo (Score:1) Tuesday August 14, @03:50AM
  • Uh.... (Score:1)

    by VonSkippy (892467) on Monday August 13, @10:30PM (#20220769)
    (http://www.hormel.com/)
    Why?
    • Re:Uh.... (Score:5, Interesting)

      by dhwebb (526291) on Monday August 13, @10:40PM (#20220837)
      (http://www.stopstupidity.com/ | Last Journal: Sunday February 23 2003, @11:16PM)
      Programming something new to some people is like playing a video game. I love programming useless things just for the challenge. People who don't understand that have never had a true love for programming.
      [ Parent ]
      • Re:Uh.... by Tablizer (Score:2) Monday August 13, @10:54PM
        • Re:Uh.... (Score:5, Funny)

          by Gazzonyx (982402) on Monday August 13, @11:53PM (#20221377)

          I love programming useless things just for the challenge.

          Have you ever worked on a project called "Clippey", by chance?
          No, he said he has a love for programming; not a seething hatred for users. Besides, everyone knows programmers only hate admins. ;) On behalf of the programmers, I'd like to say that this isn't true we love our admins. Who else makes sure that our connections*&#^$: Connection Reset By Peer
          [ Parent ]
          • Re:Uh.... by LittleBigLui (Score:2) Tuesday August 14, @02:15AM
            • Re:Uh.... by Gazzonyx (Score:2) Tuesday August 14, @11:41AM
      • Re:Uh.... (Score:4, Informative)

        by stephanruby (542433) on Monday August 13, @11:38PM (#20221265)

        Programming something new to some people is like playing a video game.
        Speaking of which, http://www.pyweek.org/ [pyweek.org] is coming up this first week of September. It's time to dust off that python book (or borrow one from someone) and do whatever you have to do to get some days off that week.
        [ Parent ]
      • I know the feeling (Score:5, Insightful)

        by aepervius (535155) on Tuesday August 14, @12:30AM (#20221543)
        They say to you that their hobby is painting/music/walking/repairing old car/gardening/making reduced model etc... And they seem to think that their hobby are perfectly acceptable. But as soon as you say you like to program stuff, they don't understand how this would be a hobby. They mostly fail to recognize that every one of us has something in common : the joy of act of creation. The fact that our hobby entail creating something immaterial and full of "logic" does not matter. It is still a joy.
        [ Parent ]
      • Re:Uh.... by Jugalator (Score:2) Tuesday August 14, @01:20AM
      • Re:Uh.... by fotbr (Score:3) Tuesday August 14, @07:50AM
    • Re:Uh.... by hobbesmaster (Score:2) Monday August 13, @10:40PM
      • Re:Uh.... by Gazzonyx (Score:3) Monday August 13, @11:59PM
        • Re:Uh.... by nschubach (Score:1) Tuesday August 14, @07:47AM
    • local resource, better interface by Zork the Almighty (Score:2) Monday August 13, @11:19PM
  • I hope (Score:4, Funny)

    by Nikron (888774) on Monday August 13, @10:36PM (#20220817)
    That you don't dump the wiki at a bad time.

    George W Bush

    Is a dick head!!!!11

    • Re:I hope (Score:5, Funny)

      by Anonymous Coward on Monday August 13, @10:58PM (#20220979)
      You mean before someone makes it inaccurate again?

      Oh, nevermind, I see the problem:

      George W Bush

      Is a dick head!!!!11

      should be

      George W Bush

      Is a dick head!!!!!!

      Man, those out to mess with the content are getting more and more subtle...
      [ Parent ]
    • Re:I hope by Ours (Score:2) Tuesday August 14, @06:21AM
    • Re:I hope.... Actually by LinuxEagle (Score:1) Tuesday August 14, @02:25PM
  • But... (Score:2, Funny)

    by Anonymous Coward on Monday August 13, @10:43PM (#20220863)
    What's the point of it if there are no vandals or flame wars to make it interesting?
  • by Brietech (668850) on Monday August 13, @10:45PM (#20220879)
    Combine this and one of the new E-ink ebook readers, make it pretty rugged, slap a solar panel on the back and man. . . you have something really close to a genuine hitchhiker's guide to the galaxy. Ah, I love where technology is heading =)
  • Only 2 days huh (Score:2, Funny)

    by Anonymous Coward on Monday August 13, @10:46PM (#20220883)

    I was able to build this in two days, most of which were spent searching for the appropriate tools. Simply unbelievable... toying around with these tools and writing less than 200 lines of code, and... presto!
    Give that man a job at Google.
  • compared to loading the 'dump' into MySQL -- which, if you want to enable keyword searching, takes days."

    Do you mean searching takes days, or loading? Searching should be quick if you index the words. If you are duplicating a bunch of local clones of wiki, then simply copy down the raw MySql table data files rather than reload from delimited files etc. (One needs to make sure their version of MySql is compatible with the table file format.)
           
  • by kcbrown (7426) <slashdot@sysexperts.com> on Monday August 13, @10:55PM (#20220953)

    "Orders of magnitude faster to install (a matter of hours) compared to loading the 'dump' into MySQL -- which, if you want to enable keyword searching, takes days."

    But....but....I thought MySQL was fast!

    :-)

  • by phliar (87116) on Monday August 13, @11:27PM (#20221185)
    (http://www.drones.com/)
    For a change it's not just a link to a .tar.gz somewhere, but an actual article where he goes through what he did, and (more important) why he did things that way. Good reading even if you don't want an off-line Wikipedia.
  • It doesn't take days (Score:5, Informative)

    It only takes days if you use the php import script to import the sql dump, which was not designed for importing the entire dump.

    Use the ANSI C implementation, which takes about 20 minutes to convert the XML to SQL and then takes a few hours to import into MySQL. Please not that you need a properly configured MySQL server in order to efficiently run a local copy of Wikipedia, which must have at least 8GB of ram.

    http://meta.wikimedia.org/wiki/Xml2sql [wikimedia.org]
  • Linda Mack! (Score:1, Funny)

    by Anonymous Coward on Tuesday August 14, @12:26AM (#20221531)
    I would be concerned that Slimvirgin and the other intelligence agent(s) might not be able to revert and ban the edits I would be making offline. Maybe Jimbo can give them authority to come rough me up at home and beat my lcd with a hammer.

    http://yro.slashdot.org/article.pl?sid=07/07/27/19 43254 [slashdot.org]

  • Mass inserts into mysql... (Score:4, Informative)

    by Splab (574204) on Tuesday August 14, @12:32AM (#20221557)
    is very very slow when you do it on a normal installation, the reason is MySQL comes with a "be nice to people who don't know what they are doing" setup. Go into the my.cnf and find the buffer settings, crank them up and restart the server. It can really do a lot (especially if you are running InnoDB which you of course are since MyISAM isn't a proper database).
  • Xapian (Score:1)

    by paltemalte (767772) on Tuesday August 14, @12:41AM (#20221591)
    For those who didn't know, Xapian [xapian.org], the search engine he used for this, is really awesome. Its very fast, stable, actively developed and packs some pretty impressive features. Its written in c++, but has bindings for Perl, Python, PHP, Java, Tcl, C#, and Ruby. If you need an embedded search function on a site, you should check it out.

    I've used it for over 2 years on various sites and am really pleased with it.
    • Re:Xapian by mikeboone (Score:2) Tuesday August 14, @08:12AM
  • What?? (Score:5, Funny)

    by icydog (923695) on Tuesday August 14, @12:48AM (#20221617)
    (http://www.icydog.net/)
    TFA is:

    1. Not a thinly-veiled attempt to advertise a crappy product
    2. Not bashing Microsoft
    3. Not about somebody who is trolling open-source (i.e. SCO)
    4. Not about Bush taking away all our rights and ending freedom
    5. Not about voting fraud and the end of democracy/America/the world
    6. Not decrying Vista DRM and its ties to the MAFIAA
    7. Posted on Slashdot

    Furthermore, TFA is interesting and informative.

    Am I in heaven?
  • C&D Tomorrow? (Score:1)

    by fishbowl (7759) <jmcgill@@@email...arizona...edu> on Tuesday August 14, @01:37AM (#20221839)
    Can't help but assume there will be a cease and desist order in the /. headlines tomorrow.
  • The Point? (Score:2)

    by photomonkey (987563) on Tuesday August 14, @01:41AM (#20221865)

    I know that not everyone has a permanent connection to the net everywhere they go, but what is the point of storing a local copy of Wikipedia?

    The beauty of it is that it is online and always up-to-date (wrong, or less wrong).

    Trying to capture it locally seems to me to be like trying to print The Internet. By the time it's done spooling, it's out of date.

    If it's an academic project, that's really cool, but I don't see a practical point to it.

    • Re:The Point? (Score:5, Insightful)

      by Mr. Roadkill (731328) on Tuesday August 14, @02:26AM (#20222067)

      I know that not everyone has a permanent connection to the net everywhere they go, but what is the point of storing a local copy of Wikipedia?
      Ummm... I think the whole point is, as you've pointed out, that not everyone has a permanent connection to the net everywhere they go. Or maybe they don't have access to everything they'd like even if they *do* have net access everywhere, or want to pay extravagant data rates while out and about.

      Joe has all-you-can-eat broadband at home, or an understanding employer with a fat pipe, and spends two hours each day on the train. Two and a half gig per month (and lets face it, you probably don't want to update it more frequently that that) and he's got probably half his reading material sorted out.

      Wang lives in Buttfuckistan, a fictional country with totalitarian leanings with too many real-world counterparts. The Great Firewall of Buttfuckistan (i.e. squidguard, under the control of Buttfuckistan Telecom, and settings in the routers to drop non-port-80 traffic half the time) makes it impossible to reliably access Wikipedia from inside their borders, which is a great shame because the entry on Buttfuckistan is particularly unflattering. Once a month, Joe sticks a DVD with five minutes from an old re-run of Friends and an encrypted dump of Wikipedia in an airmail envelope and sends it to Wang.

      Mary is still at secondary school, and her particular school has wifi access for students who are encouraged to purchase their own laptops, but since the local pastor discovered http://en.wikipedia.org/wiki/Image:Dream_of_the_fi shermans_wife_hokusai.jpg [wikipedia.org] they've been forced to add wikipedia to the school's blocklist. Which is a pity, because it's a great first-approximation source for material or research directions, but there you go. Mary can make a local copy through her home broadband connection, and can access it locally on her laptop wherever she goes - even at school, or church. Bill, Jillian and Mungo (the pastor's son) find out about this, and now all four of them take it in turns to make the copy each month, sharing the bandwidth costs. Their friends Harry and Sally, who don't have broadband but are great friends of the other four, also get copies... and there are plans to distribute the copies further, as a kind of teenage grass-roots knowledge-sharing and social-justice effort.

      Still can't see the point?
      [ Parent ]
    • Re:The Point? by Riktov (Score:2) Tuesday August 14, @02:46AM
    • Re:The Point? by epine (Score:2) Wednesday August 15, @03:06PM
  • Dude, WP:1.0 [wikipedia.org] wants YOU.
  • by nullchar (446050) on Tuesday August 14, @02:21AM (#20222037)
    Wikipedia seems the best place for the author's "how to download and use offline".
  • What about moulin? (Score:2)

    by maubp (303462) on Tuesday August 14, @02:37AM (#20222115)
    How is this different to moulin which is a fully interactive, offline version of the entire Wikipedia (without pictures) on a CD-ROM:

    http://moulinwiki.org/l/en/ [moulinwiki.org]
  • by georgewilliamherbert (211790) on Tuesday August 14, @02:51AM (#20222167)
    There's a one-button (for admins) export-the-whole-wiki-as-html feature in modern MediaWiki software installs...

    But hey, two days and a few hundred lines of code is cool. You geek (verb). If we always took the easy way out we'd be using Windows and have committed suicide long ago.
  • can we get a PSP version of it? (Score:4, Interesting)

    A PSP is very portable (fits in your sweater/backpack), hackable, and has up to 8Gb of storage. I have been dreaming for an year about porting wikipedia to it. Unfortunately I'm not familiar with the kind of programming needed and I could never find the time...
  • by dannycim (442761) on Tuesday August 14, @04:25AM (#20222493)
    There's a serious problem with the article's way of treating the data that I didn't see addressed.

    The wikipedia database file is one large bzip2'ed XML file which the author splits into blocks of 900k (bzip2's natural blocking) which he then parses for the "title" and "text" XML tags.

    The problem with that approach is that some of these tags may well end up being split over block boundaries, so some articles risk being missed. EG:

    END-OF-BLOCK: blablablabla...blabla[/text][othertag][ti

    START-OF-NEXT-BLOCK: tle][sometag]blablablablabla...

    So searching for "[title]" in boths blocks separately like TFA does will fail for one article.

    (I've used square brackets instead of lessthans and greaterthans because slashdot won't let me use them.)
  • why are people saying this is useless? it's just what I need, and I was suprised when I went looking and couldn't find it when I first wanted it.

    I cruise around in a sailboat. my longest passage was 35 days. how I would have LOVED to have been able to read wikipedia articles on that passage, even if they were a few weeks old. What do I care if an article is a few weeks old? 35 days at sea I'd have read a paper encyclopedia if I had one, but my boat isn't big enough to carry the weight of a paper encyclopedia. It sails like shit as it is from how many books I have stuffed in my V-berth.

    and sometimes I'm on some random little island with no internet access for a periods of time... hanging out with a bunch of other sailors, and of course we get into discussions that leave us wishing we could go google something.

    Even in Bora Bora, they had internet but it was 24$/hour, on crappy old computers! this would have been great!

    and now! now I'm in China! They block parts of wikipedia. yeah I can setup and SSH tunnel when I happen to have internet access available, but how great it is to have a local (though somewhat outdated) copy of wikipedia, including any blocked articles!

    sounds great!
  • I didn't know about the wikipedia raw database, or I'd probably have done something like this myself, and hooked it into the UNIX "locate" db, or Spotlight, or maybe...

    $ man w locate
    GNU Locate
    From Wikipedia, the free encyclopedia
      (redirected from Locate)
    ...
    This software-related article is a stub. You can help Wikipedia by expanding it.
  • by chrb (1083577) on Tuesday August 14, @06:10AM (#20222895)
    Are there any projects putting wp on java enabled phones? It would be pretty cool to settle those arguments any time, any place.
  • I just need an offline wikipedia now.
  • lucene (Score:1)

    by wwmedia (950346) on Tuesday August 14, @08:43AM (#20223997)
    (http://www.footballfans.tv/)
    he'll be better of using lucene for search, faster than mysql
  • by Gen.Anti (1089529) on Tuesday August 14, @08:45AM (#20224013)
    The core idea for accessing bzip-compressed data is interesting.

    The later execution, however... Perl, Python, PHP, Xapian, Django all together as the runtime, and add to this C code for the preparation (I might be wrong about the details, just skimmed), for such a small application.

    The other poster who rejoices about how wonderfully old-skool TFA is, is obviously right. This kind of duct-tape Linux development feels badly sooo 90's and smells like maintenance, performance, installation and portability problems.

    Keeping it all in just one of the scripting languages would make it much more serious. (Perl, or maybe Bash for the easiest installation?)

  • sdict? (Score:2)

    by jimmyfergus (726978) on Tuesday August 14, @09:01AM (#20224203)

    Anyone with experience of sdict [sdict.com]?

    They offer a dictionary reader for various systems, including portable devices, and dictionaries including Wikipedia.

    Unfortunately their Wikipedia dict is a old (January), but it seems like a good approach for laptops or other small devices. When I get an 8Gb SDHC I'm going to try it on my Nokia N800.

  • Not New (Score:2)

    by Baavgai (598847) on Tuesday August 14, @09:44AM (#20224749)
    (http://www.chaingang.org/code/)
    While interesting, it's certainly reinventing the wheel. There are lots of methods for doing this found on the site itself ( http://en.wikipedia.org/wiki/Wikipedia:Database_do wnload [wikipedia.org] ) including static content already marked up.

    Also, as others have noted, the his choping the file into chunks means you're going to loose at least one article per chunk.

    I'd implemented this with a compressed file system and maybe some symlinks. Happily, the static content is already there for the taking. Some find and grep on a file system should be enough to do a title search with little overhead. The web server ( for searches from a browser ) need be little more than a daemon, in perl, python, etc, you could do it in less than fifty lines.
  • Gears? (Score:2)

    by pragma_x (644215) on Tuesday August 14, @09:53AM (#20224869)
    (Last Journal: Wednesday December 08 2004, @01:13PM)
    I didn't see any posts on this so I thought I'd bring it up. I think the author took the long way around.

    The author did some nifty hacking that resulted in the following stack of dependencies:

            * Perl 5.8.5
            * Python 2.5
            * PHP 5.2.1
            * Xapian 1.0.2
            * Django 0.9.6

    He cited not wanting to use a RDBMS since he's not writing to the database, just reading. I can give him that, but it seems like it caused more trouble that it's worth.

    This leaves me wondering: why not just use Google Gears and be done with it. Sure, the hacking part would shift largely to the javascript side of things (would be mostly wiki conversion), but you'd have the other bits (web server, and storage) already worked out. All you'd have to do is slap together some little app to insert the XML data into the database.
    • Re:Gears? by Gen.Anti (Score:1) Tuesday August 14, @10:01AM
      • Re:Gears? by pragma_x (Score:2) Tuesday August 14, @02:12PM
  • by Cato (8296) on Tuesday August 14, @10:09AM (#20225063)
    I live in an area with fairly bad mobile signal - I'm always trying to look things up on Wikipedia but finding I can't. Fortunately my Treo 680 smartphone can take 8GB SDHC cards (http://en.wikipedia.org/wiki/Treo_680 [wikipedia.org]), so I could fit this on with room to spare for MP3s and photos, and future growth of Wikipedia. Very tempting, though I'd need to port it to something like Lua and GCC - obviously the porting would be fairly trivial by the time Palm releases its Linux-based Treos in early 2008...
  • OLPC (Score:1)

    by thekaran (588921) on Tuesday August 14, @12:16PM (#20226763)
    (http://www.thekaran.com/ | Last Journal: Tuesday June 14 2005, @11:06AM)
    Preload this in OLPC
  • Yeah, TomeRaider costs money, which already marks it as a loss against the "thrifty" slashdot crowd ;) , but I think it's worth it.

    http://www.tomeraider.com/ [tomeraider.com]

    They provide Wikipedia versions that you can use with their e-Reader. I bought one because I can use it on my Pocket PC, and it's just awesome having the Wikipedia available instantly, anytime. It's the fucking Hitchhiker's Guide version 0.1, gyat damn. =)
  • by martinicus (228041) on Wednesday August 15, @11:10AM (#20237779)
    From TFA:

    "The result of the import process was also not exactly what I wanted: I could search for an article, if I knew it's exact name; but I couldn't use parts of the name to search; it was all or nothing. To allow these "free-style" searches to work, one must create the search index - which I'm told, takes days to build. DAYS!"

    This seems kind of stupid to me...could he not have done an SQL query using the moral equivalent of 'select articlettext where title like '%thingiaminterestedin%' - thus meaning you don't need to know the exact title.
  • by thc4k (951561) on Thursday August 16, @07:20AM (#20247573)
    (http://ls-themes.org/)
    Uhm, so he used a offline copy of wikipedia with a search engine to search through it? Wow that's like .. one of the very reason xapian exists. All the hightlights except latex come from xapian. This is like 1. Download Wikipedia 2. Install Xapian 3. Make Xapian index wikipeida 4. ???? 5. Slashdot!
  • Re:2X (Score:1)

    by computerman413 (1122419) on Monday August 13, @10:49PM (#20220909)
    Wikipedia is a little more than a bunch of HTML pages. From what little I understand, there's a massive database which composes Wikipedia, and there's a script which puts everything together into the Wikipedia we know and love.
    [ Parent ]
    • Re:2X by deftcoder (Score:1) Monday August 13, @10:55PM
    • Re:2X by Gracenotes (Score:1) Tuesday August 14, @10:08AM
  • by Anonymous Coward on Monday August 13, @11:06PM (#20221035)
    And that doesn't happen offline? Only naive people like you need to be worried about reading Wikipedia.

    There are bastards of every academic, social, and financial background.
    [ Parent ]
  • by Tacvek (948259) on Monday August 13, @11:14PM (#20221083)
    (Last Journal: Friday June 23 2006, @01:26PM)
    My very serious question to you is how much better do you think things are at a "real" encyclopedia. They have many of the same problems, but they are just not public. "Real" encyclopedias can be just an inaccurate as the Wikipedia on many articles. For a quick first reference, Wikipedia is an ideal tool. Just be sure to take things with a grain of salt if you are not checking the sources for further information. Guess what though, the same applies to "real" encyclopedias too. One difference is that with "real" encyclopedias, you always lack revision information, and you often lack information about the sources used by the editors. (Some encyclopedias are better than others in that respect.)
    [ Parent ]
  • Re:2X (Score:5, Informative)

    by Brian Gordon (987471) on Monday August 13, @11:24PM (#20221161)
    Ahaha, 2.9GB? That's the text alone. Images will net you more than 200GB [wikimedia.org] more. And yes, you do need a LAMP/WAMP and working mediawiki, but it wouldn't take 'days' it would take a few hours max. Also is this guy aware that wikipedia is available on DVD [wikipedia.org] already?
    [ Parent ]
  • by Bombula (670389) on Monday August 13, @11:35PM (#20221241)
    It might defend on the topic/field in question. The articles you reference seem to be focused on tech stuff. I use wikipedia primarily for socioeconomic reference material, and find it in general to be pretty solid. There are places where the depth is limited, but it's definitely my first-reach resource as long as I have an internet connection - mainly because many of the specific things I'm after might not be in a general encyclopedia like Britannica - intertemporal equilibrium, hedonic regression, Edgeworth's limit theorem, the Bertrand paradox etc, etc.
    [ Parent ]
  • And there's more, but you get the idea. Collusion to ruin people's lives when they run afoul of admins, corrupt editors doing and getting favors from the head honcho himself, pet pages that end up with incorrect information, speculation, or specious reasoning, and a general air of arrogance and groupthink reinforcing an internal idea that they can do no wrong.
    You missed a few, such as product placement pages and ancient "This page doesn't conform to {{?}} standards" tags. That, and obscure fields get limited attention.

    Why bother, seriously?
    Because the breadth of material covered in Wikipedia is unparalleled, as is the timeliness of information in many fields of interest. And it's a hell of a lot more compact than a 100 lb encyclopedia set, and cheaper to boot.
    [ Parent ]
  • Blatantly stolen from David Morgan-Mar [livejournal.com].

    In many of the more relaxed corners of the Outer Eastern Rim of the Internet, Wikipedia has already supplanted the great Encyclopaedia Britannica as the standard repository of all knowledge and wisdom, for though it has many omissions and contains much that is apocryphal, or at least wildly inaccurate, it scores over the older, more pedestrian work in two important respects.

    First, it is slightly cheaper; and secondly it has the words "anyone can edit" inscribed in large friendly letters on its cover.
    [ Parent ]
  • 6 replies beneath your current threshold.