Algorithm Rates Trustworthiness of Wikipedia Pages

Algorithm Rates Trustworthiness of Wikipedia Pages 175

Posted by CowboyNeal on Friday August 31, 2007 @07:38AM from the getting-it-right dept.

paleshadows writes "Researchers at UCSC developed a tool that measures the trustworthiness of each Wikipedia page. Roughly speaking, the algorithm analyzes the entire 7-year user-editing-history and utilizes the longevity of the content to learn which contributors are the most reliable: If your contribution lasts, you gain 'reputation,' whereas if it's edited out, your reputation falls. The trustworthiness of a newly inserted text is a function of the reputation of all its authors, a heuristic that turned out to be successful in identifying poor content. The interested reader can take a look at this demonstration (random page with white/orange background marking trusted/untrusted text, respectively; note "random page" link at the left for more demo pages), this presentation (pdf), and this paper (pdf)."

Algorithm Rates Trustworthiness of Wikipedia Pages

This discussion has been archived. No new comments can be posted.

Search 175 Comments Log In/Create an Account

Comments Filter:

Seems a bit dangerous (Score:5, Insightful)

by fymidos ( 512362 ) writes: on Friday August 31, 2007 @07:45AM (#20422871) Journal

>If your contribution lasts, you gain 'reputation,' whereas if it's edited out, your reputation fails

And the editor wars start ...

Godwin's Second Law (Score:3, Insightful)

by Anonymous Coward writes: on Friday August 31, 2007 @07:47AM (#20422887)

Every paper touting automatic adjustments for gaming the system becomes obsolete the moment it is published.

(Godwin didn't publish this, but I might get around to editing his Wikipedia entry to say that he did).

Re:Seems a bit dangerous (Score:4, Insightful)

by N!k0N ( 883435 ) writes: <dan&djph,net> on Friday August 31, 2007 @07:52AM (#20422921)

Yeah, that is a bit of a "dangerous" way to go about rating the content, however I think it could be a step in the right direction. If this can be improved, perhaps the site will gain a better reputation in the eyes of professors. Now, I don't doubt that there is a lot of misinformation on the site (intentional or otherwise); however, a good deal of the information I have used for research papers or to quickly check something seems to be confirmed elsewhere (texts, journals, etc).

#REDIRECT (Score:5, Insightful)

by Chris Pimlott ( 16212 ) writes: on Friday August 31, 2007 @07:56AM (#20422943)

It appears they include #REDIRECT pages; the very first page the random link took me to was Cheliceriformes [ucsc.edu], with the #REDIRECT line in orange. Seems an easy way to gain trust, once a redirect is created it is hardly ever changed.

I dunno about this system. (Score:5, Insightful)

by Wilson_6500 ( 896824 ) writes: on Friday August 31, 2007 @07:57AM (#20422949)

Does it take into account magnitude of error corrections? If major portions of someone's articles are being rewritten, that's a good reason to de-rep them. If someone makes a bunch of minor spelling or trivial errors, then that's not necessarily a reason to do so.

And, of course, there is the potential for abuse. If the software could intelligently track reversions and somehow ascribe to those events a neutral sort of rep, that would probably help the system out.

As it stands, they're essentially trying to objectively judge "correctness" of facts without knowing the actual facts to check. That's somewhat like polling a college class for answers and assigning grades based on how many other people DON'T say that they disagree with a certain person in any way.

I suspect this heuristic measures.... (Score:5, Insightful)

by Anonymous Coward writes: on Friday August 31, 2007 @07:57AM (#20422957)

the relative controversy of the item being edited.

If I edit a history page of a small rural village near where I live, I can guarantee that it will remain unaltered. None of the five people who have any knowledge or interest in this subject have a computer.

If I edit an item on Microsoft attitude to standards, or the US occupation of Iraq, I'm going to be flamed the minute the page is saved, unless I say something so banal that noone can find anything interesting in it.

But my Microsoft page might be accurate, and my village history a tissue of lies....

Tuned for Subject Matter (Score:5, Insightful)

by erroneous ( 158367 ) writes: on Friday August 31, 2007 @07:58AM (#20422961) Homepage

Sounds like a worthy start to the process of introducing more trustworthyness into Wikipedia entries, but this maybe needs tuning for content type too.

Afterall just because someone is a reliable expert at editing the wikipedia entries on Professional Wrestling [wikipedia.org] or Superheroes [wikipedia.org] doesn't necessarily mean we should trust their edits on, for instance, the sensitive issues of Tibetan sovereignty [wikipedia.org].

Tyranny of the majority (Score:5, Insightful)

by G4from128k ( 686170 ) writes: on Friday August 31, 2007 @08:06AM (#20423015)

Although this method will certainly help filter pranks and cranks, it won't help if the "consensus" among wikipedia authors is wrong. If a true expert edits a page, but the masses don't agree with the edit, they will undo the expert's addition and give the expert a low reputation. Thus, the trust rating becomes a tool for maintaining erroneous, but popular ideas.

That said, I can't help but believe that this tool is a net positive because it makes points of debate more visible. One could even argue that it literally highlights the frontiers of human knowledge. That is, high-trust (white) text is well known material and highlighted (orange) text represents contentious or uncertain conclusions.

Don't Care. (Score:2, Insightful)

by pdusen ( 1146399 ) writes: on Friday August 31, 2007 @08:12AM (#20423055) Journal

I might give a damn if Wikipedia editors had any actual interest in keeping articles truthful.

It doesn't have to be perfect (Score:5, Insightful)

by KingSkippus ( 799657 ) * writes: on Friday August 31, 2007 @08:14AM (#20423069) Homepage Journal

No algorithm, except maybe personally checking every single article yourself, will ever be perfect. I suspect that the stuff you talk about will be very rare exceptions, not the rule. In fact, one of the reasons that it is so rare is because people who know what the actual truth of a matter is can post it, cite it, and show it for all to see that some common misconception is, in fact, a misconception. This is much better than, say, a dead tree encyclopedia where, if something incorrect gets printed, it will likely stay that way forever in almost every copy that's out there. (And, incidentally, no such algorithm can exist, since dead tree encyclopedias generally don't include citations and/or articles' editing histories.)

The goal wasn't to create a 100% perfect algorithm, it was to create an algorithm that provides a relatively accurate model and that works in the vast majority of cases. I don't see any reason this shouldn't fit the bill just fine.

Re:It doesn't have to be perfect (Score:2, Insightful)

by duggi ( 1114563 ) writes: <prathyusha_malyala.yahoo@com> on Friday August 31, 2007 @08:32AM (#20423187)

Why bother with an algorithm in the first place. Wikipedia is good for learning facts. If someone wants to know what Mary's room experiment was, they can find it. But if they want to know who did it and what kind of a person he is, should they not be referring to two or more sources? I guess the problem with credibility arises only when there is an opinion involved. It might work , sure, but when you come to know that the article is one big lie, would you not do some more research on finding out what is right? And if you see the page is clean, would you stop at that point?

This will promote one thing (Score:2, Insightful)

by Daimanta ( 1140543 ) writes: on Friday August 31, 2007 @08:35AM (#20423217) Journal

Groupthink.

Re:Tyranny of the majority (Score:5, Insightful)

by Anonymous Brave Guy ( 457657 ) writes: on Friday August 31, 2007 @08:35AM (#20423221)

Yes, this system demonstrates the correlation between the content and the majority opinion, not between the content and the correct information (assuming such objectively exists).

Of course, if you take as an axiom that the majority opinion will, in general, be more reliable than the latest random change by a serial mis-editor, then the correlation with majority opinion is a useful guideline.

Something that might be rather more effective, though perhaps less practical, is for Wikipedia to bootstrap the process much as Slashdot once did: start with a small number of designated "experts", hand-picked, and give them disproportionate reputation. Then consider secondary effects when adjusting reputation: not just whether something was later edited, but the reputation of the editor, and the size of the edit.

This doesn't avoid the underlying theoretical flaw of the whole idea, though, which is simply that in a community-written site like a wiki, edits are not necessarily bad things. Someone might simply be replacing the phrase "(an example would be useful here)" with a suitable example. This would be supporting content that was already worthwhile and correct, not indicating that the previous version was "untrustworthy".

Re:Seems a bit dangerous (Score:3, Insightful)

by fymidos ( 512362 ) writes: on Friday August 31, 2007 @08:47AM (#20423321) Journal

> Editor wars are an old thing

but they get a whole new meaning when it makes sense to find all edits by an editor, delete them, and then rewrite them as your own...

AfD: nn (Score:3, Insightful)

by tepples ( 727027 ) writes: <tepples.gmail@com> on Friday August 31, 2007 @09:07AM (#20423457) Homepage Journal

If I edit a history page of a small rural village near where I live, I can guarantee that it will remain unaltered. None of the five people who have any knowledge or interest in this subject have a computer.
If nobody else who has a computer cares, then it's less likely that your edits can be backed up with reliable sources [wikipedia.org]. In fact, people might be justified in nominating the article for deletion on grounds of lack of potential sources [wikipedia.org].

Should be called "stability" (Score:3, Insightful)

by Random832 ( 694525 ) writes: on Friday August 31, 2007 @09:16AM (#20423551)

"trustworthiness" doesn't enter into whether something gets edited out, for precisely the same reason a need for this is perceived at all: it can be edited by anyone!

Re:Seems a bit dangerous (Score:3, Insightful)

by xappax ( 876447 ) writes: on Friday August 31, 2007 @10:01AM (#20423997)

If this can be improved, perhaps the site will gain a better reputation in the eyes of professors.

No, it won't gain a better reputation in the eyes of professors (at least decent professors) for two reasons:

1) It's an inherently flawed algorithm and easily gameable. It's useful as a very vague unreliable data-point, and not much else.

2) Wikipedia is not a source for academic research, and never will be. If it's anything to academics, it's a place to go to get some clues on how to proceed with their real research - for example finding links to reliable sources, or related terms and concepts. It's like Google: a great tool for research, not a source.

Wikipedia is not and has never claimed to be an authoritative source on anything, and until people stop referring to it as though it is (or could be, or claims to be) - we'll never get over this wanking about "Don't trust wikipedia, it's not reliable - anyone can change it, omg!"

Re:Light Bulb Moment (Score:3, Insightful)

by MrNaz ( 730548 ) writes: on Friday August 31, 2007 @10:17AM (#20424207) Homepage

"We need a good search engine on top of a tor network, and bandwidth to make it run smooth. Not many other way to achieve real net freedom."
Can you explain yourself a little more? I don't see how Tor would improve the quality of information being searched for. (Not arguing, just interested in your ideas)

Never confuse popularity with factual truth (Score:2, Insightful)

by presidenteloco ( 659168 ) writes: on Friday August 31, 2007 @11:55AM (#20425617)

They might be somewhat correlated, on a statistical basis, over
many cases, but there are many individual cases and times
when the currently popular view is wrong and the lone
wolf opinions are later proven to have been correct.

This algorithm would seem to be more of a popularity contest
than a truth finder. I think we have to be very wary of
the truth by mass agreement theory.

Hint: Remember the "weapons of mass delusion" ?
I bet someone commenting that the US government is lying
through their teeth about it would have been re-edited
pretty quick.

Re:Hmmmmmmm (Score:2, Insightful)

by skoaldipper ( 752281 ) writes: on Friday August 31, 2007 @02:04PM (#20427211)

What's prevents Wikipedia setting up a foo area moderated by a panel of foo experts known to the foo community?
Define experts.

Wikipedia does an extroadinary job from a wide variety of peer resources, both professional and layman alike. So called "experts" like academia are just as political in their research and analysis as well - specifically, in the social sciences. Peer review never really amounts to much more than a consensus, but not necessarily an accurate one. Objectivity is the holy grail which I don't think will ever be achieved whether in Encyclopedia, Wikipedia, or newspaper for that matter. The objectivity is best left to the reader, as well as the research, imho.

What you're asking for is really nothing more than some sort of certification, which most use as nothing more than back patting for their particular opinion. I say, take an Encyclopedia or Wikipedia for what it is, and just move on to the next.

Spelling Mistakes? (Score:3, Insightful)

by logicnazi ( 169418 ) writes: <gerdesNO@SPAMinvariant.org> on Friday August 31, 2007 @03:26PM (#20427967) Homepage

What I want to know is if it is smart enough to distinguish edits that correct spelling and grammar mistakes from those that change content.

In particular I'm worried that the system will undervalue the information from people whose edits are frequently cleaned up by others even if that content is left unchanged.

Re:Light Bulb Moment (Score:4, Insightful)

by PingPongBoy ( 303994 ) writes: on Friday August 31, 2007 @04:01PM (#20428229)

Sounds crappy. Let's say you expose some important misdeed. You're likely to be edited out by an army of paid staff who keeps an eye on the 'net

Nope. If you post one misdeed and that gets edited out, such is life but shouldn't affect your credibility that much because everyone is always getting edited out a few times in the long run.

However, if you edit hundreds or thousands of different articles and people leave you alone, o great guru, you're good.

Wikipedia's ultimate strength depends on the community's desire for good information, readiness to stomp on crap, and will to contribute. Conversely, Wikipedia would decay if people didn't give a rat's ass about Wikipedia and let it go to ruin like an unweeded garden. This mechanism of quality control needs to be applied down the hierarchy of categories, subcategories, and articles. It's understandable that certain areas will have more pristine content overall while other areas will be populated with childish and wanton ideas. Thus, a contributor evaluation program can be tested.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Algorithm Rates Trustworthiness of Wikipedia Pages 175

Algorithm Rates Trustworthiness of Wikipedia Pages More Login

Algorithm Rates Trustworthiness of Wikipedia Pages

Seems a bit dangerous (Score:5, Insightful)

Godwin's Second Law (Score:3, Insightful)

Re:Seems a bit dangerous (Score:4, Insightful)

#REDIRECT (Score:5, Insightful)

I dunno about this system. (Score:5, Insightful)

I suspect this heuristic measures.... (Score:5, Insightful)

Tuned for Subject Matter (Score:5, Insightful)

Tyranny of the majority (Score:5, Insightful)

Don't Care. (Score:2, Insightful)

It doesn't have to be perfect (Score:5, Insightful)

Re:It doesn't have to be perfect (Score:2, Insightful)

This will promote one thing (Score:2, Insightful)

Re:Tyranny of the majority (Score:5, Insightful)

Re:Seems a bit dangerous (Score:3, Insightful)

AfD: nn (Score:3, Insightful)

Should be called "stability" (Score:3, Insightful)

Re:Seems a bit dangerous (Score:3, Insightful)

Re:Light Bulb Moment (Score:3, Insightful)

Never confuse popularity with factual truth (Score:2, Insightful)

Re:Hmmmmmmm (Score:2, Insightful)

Spelling Mistakes? (Score:3, Insightful)

Re:Light Bulb Moment (Score:4, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot