Carnegie Mellon CAPTCHA Digitization Project Now Underway 119

Posted by Zonk on Tuesday October 02, 2007 @08:44AM from the way-more-fun-than-the-usual-kind dept.

tomandlu writes "The BBC is reporting that Carnegie Mellon University has found a novel use for CAPTCHAs — deciphering old texts. We've discussed this project before, but it was prior to it getting off the ground. Users Entering text acts as a sort of distributed computing project. Basically, the CAPTCHA is made up of two words — one of which is known to Carnegie, and one of which isn't. If the user correctly deciphers the known word, then the unknown word is assumed to be correct. Well, almost. Two different users must give the same answer to the same unknown CAPTCHA before it is taken off the list. 'Using the reCAPTCHA system von Ahn's team is digitizing documents and manuscripts as fast as the Internet Archive can supply them, and the good news for book lovers (and bad news for spammers) is that the supply of reCAPTCHAs is not likely to dry up any time soon.'"

Carnegie Mellon CAPTCHA Digitization Project Now Underway

This discussion has been archived. No new comments can be posted.

Search 119 Comments Log In/Create an Account

Comments Filter:

Re:I want to participate... (Score:5, Informative)

by EvilGrin666 ( 457869 ) writes: on Tuesday October 02, 2007 @08:54AM (#20821689) Homepage

Here's the website, http://recaptcha.net/ [recaptcha.net]

Give it a go! (Score:2, Informative)

by cookieinc ( 975574 ) writes: on Tuesday October 02, 2007 @09:11AM (#20821807)

You can try it out at the top of this page [recaptcha.net].

"Turing" test (Score:3, Informative)

by DrLex ( 811382 ) writes: on Tuesday October 02, 2007 @09:49AM (#20822271) Homepage

Well, this finally makes CAPTCHAs somewhat useful. I won't try to formulate it in some sugar-coated way: I personally hate CAPTCHAs. On some types (especially the ones from Digg), I fail about 50% of them, and that's getting quite annoying after a while. Especially when your code is rejected even if you believe there is no doubt about what you've read in the image.
I believe CAPTCHAs are the wrong solution to the wrong problem. It's a bit exaggerated to call them a "Turing test", because I'm quite sure that OCR systems will be made in the near future that are better than humans in reading CAPTCHAs. A simple text-based question that requires actual intelligence is a much better Turing test, and also a much smaller nuisance for people with impaired vision. Of course, writing a foolproof system that can produce a nearly infinite amount of such questions is a challenging problem by itself.

Re:I'm not so sure this is a good idea. (Score:5, Informative)

by Falkkin ( 97268 ) writes: on Tuesday October 02, 2007 @10:07AM (#20822517) Homepage

"And that's not even counting malice where people deliberately put wrong words in."

We're already getting several million legitimate solutions a day. The chance that a few malicious people would happen to get the same CAPTCHA is relatively small. Also, for many of our words, the OCR's answer happens to be correct -- it just doesn't have high confidence in the word. If a single person agrees with the OCR in this case, we can mark the word as "read" with no further human confirmation. For this reason, many of the words will only ever be shown to a single human.

Re:`CowboyNeal' answer to all CAPTCHAs (Score:5, Informative)

by Falkkin ( 97268 ) writes: on Tuesday October 02, 2007 @10:21AM (#20822725) Homepage

Sorry, but we've already thought of this attack :)

We can compute the daily frequency of each human-provided solution and automatically flag anything that suddenly jumps in popularity. It's especially suspicious if these answers always disagree with the OCR's guess (often the OCR happens to be right, but just doesn't have high confidence).

Re:I'm not so sure this is a good idea. (Score:4, Informative)

by Falkkin ( 97268 ) writes: on Tuesday October 02, 2007 @10:38AM (#20822965) Homepage

You said "people" putting in wrong words (ala the suggestion someone said below about "everyone fill in CowboyNeal!"), which is quite different from automated attacks. For that, we have numerous scripts that notice various forms of anomalous behavior from any given IP. We manually review these to make sure the answers are reasonable. We are also working with CERT, who have a large database of botnetted machines, to detect attacks. I'm not going to give complete details of everything we check, but rest assured that we are very active in preventing attacks -- our goal is to be the best CAPTCHA in the world, and we take security threats very seriously.

In terms of the digital output, we spot-check some of the transcribed pages every day. These spot-checks will also turn up any anomalous solutions, with high probability.

JS is almost unavoidable for logins now. (Score:3, Informative)

by Kadin2048 ( 468275 ) * writes: <slashdot.kadin@NOsPAM.xoxy.net> on Tuesday October 02, 2007 @10:57AM (#20823271) Homepage Journal

Unfortunately I think most CAPTCHAs use JS; it's been a while since I've been to a site that didn't make me turn it on to get through login/registration. I have no idea why this is, since people have been doing login pages since before JS was around or popular, but now it seems like the way every idiot is doing it.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Carnegie Mellon CAPTCHA Digitization Project Now Underway 119

Carnegie Mellon CAPTCHA Digitization Project Now Underway More Login

Carnegie Mellon CAPTCHA Digitization Project Now Underway

Re:I want to participate... (Score:5, Informative)

Give it a go! (Score:2, Informative)

"Turing" test (Score:3, Informative)

Re:I'm not so sure this is a good idea. (Score:5, Informative)

Re:`CowboyNeal' answer to all CAPTCHAs (Score:5, Informative)

Re:I'm not so sure this is a good idea. (Score:4, Informative)

JS is almost unavoidable for logins now. (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot