ReCAPTCHA.net Now Vulnerable to Algorithmic Attack 251
n3ond4x writes "reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms." There's probably an excellent Firefox plugin to render this page's color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
Human Success? (Score:5, Insightful)
So what is the average human success rate? I think mine is only about 50%
Bad Hacking (Score:5, Insightful)
Why would anyone want to do this? It's like attacking the UN peace keeping troops or the Red Cross. reCAPTCHA is doing good work, digitizing scanned printed books so that the the text can be made available for online searching. Breaking reCAPTCHA is like defecating in the village well, ensuring that everyone suffers. No one benefits from reCAPTCHA being broken. No one.
Re:Pretty cool stuff (Score:1, Insightful)
This won't happen. Many current CAPTCHAs are already hard to solve for humans, and increasing the computational cost to solve a CAPTCHA will also make it harder to solve for humans.
Now, the problem is, computers are getting more powerful every day, while humans don't. Sooner or later, this simple fact will render CAPTCHAs useless.
Re:Bad Hacking (Score:5, Insightful)
Spammers.
Re:Bad Hacking (Score:5, Insightful)
Actually, it could be of use to reCAPTCHA, they can just pass their test words through this system before they make them public and then use the output to help prevent similar attacks.
Re:Bad Hacking (Score:4, Insightful)
Re:Bad Hacking (Score:5, Insightful)
No one benefits from reCAPTCHA being broken. No one
You couldn't be more wrong. Sure, breaking reCAPTCHA would create a headache for website admins (including me, for example), but in order to break reCAPTCHA someone has to devise a better text recognition program. And that's great news! This is an example of a general side effect of the cat and mouse game that are captchas. Captcha's are a simple form of Turing Test, where website admins are trying to determine who is a computer and who is a real human being. Every time a captcha gets broken, we get a sophisticated new algorithm for doing something that previously only humans could do (or only humans could do well, at least).
How is this 30% accurate??? (Score:4, Insightful)
Re:Hmm (Score:3, Insightful)
I'm glad YOUR common sense kicked in before hundreds of others.
Re:far from it (Score:3, Insightful)
Interesting. If this is true as stated, and one knew/modeled OCR performance, you could use this information in some cases to pick out the plum and boost the crack...
Re:Bad Hacking (Score:2, Insightful)
reCaptcha, and indeed all Captchas have a fundamental flaw.... advances in computer vision will eventually render them all obsolete.
Most of the CS knowledge is already around to totally defeat captchas of this sort... it's only an Engineering question. They will most likely get broken when sufficiently unethical engineers are hired by sufficiently wealthy spammers.
It's basically a known fact, that spammers will eventually break conventional captchas totally, by developing algorithms to guess captcha answers. It's only a question of when and how long will it take them to figure out all the systems that matter.
This does not mean it is a respectable thing for people to specifically target Captcha and attempt to hasten its demise.
reCaptcha is a big one... but there are other Captcha systems that matter (like Google's).
And there are other ways around them besides software algorithms... Amazon-style mech turk, for example... find a few thousand folks in certain countries to pay $0.05/hour for breaking captchas, and suddenly reCaptcha is no longer a boundary.
Re:Bad Hacking (Score:4, Insightful)
The problem of breaking reCAPTHCA is precisely the same problem as increasing computer OCR abilities
No it isn't. Well, not unless you read books with wavy crossed-out words and don't mind 30% accuracy.
Re:Bad Hacking (Score:2, Insightful)
Except the algorithm doesn't really do that... to defeat the captcha, it only needs to get it right about 10 or 20% of the time, to give the malicious script a "good enough guess" to brute-force the Captcha with 5 or 6 retries.
As long as the number retries are less than those the a fair percentage of humans require....
Re:Speaking about re-captcha (Score:3, Insightful)
Re:My eye's... (Score:3, Insightful)
By the way, that wasn't just a facetious comment. TFA isn't a serious paper. It's not even typeset, just typed into Microsoft Word. And god knows why I'm being warned about VBScript macros when I try to open it.
And this isn't a case where the little guy is making real scientific progress right under the nose of the obsolete establishment. The author doesn't even have a freshman understanding of big-O notation, it's completely juvenile.
Re:Speaking about re-captcha (Score:3, Insightful)
So its for-profit work for the biggest advertising firm in the world.
Sort of expected project gutenberg or something.
Google's digitizing hundreds of thousands of historic books from some of the great university libraries. What's the problem here, that they won't lose money on the effort?
The NYT archive has been done for at least a year, it made reCAPTCHA a feasible company.
Multiple choice doesn't work for CAPTCHAs (Score:3, Insightful)
The spammers can just choose a random option until they get in. All that will do is slow them down a bit.
Re:Human Success? (Score:4, Insightful)
Yeah, I agree with this. Recaptcha is one of the easiest out there.
Admittedly though, I have around about 3% success rate with vBulletin captchas. Hear that forum owners? I'm not joining your forum because I can't read your captcha!
Re:OCR improvements? (Score:4, Insightful)
The problem is that since you are *probably* solving the verification words with higher accuracy to begin with, you are actually poisoning the data being gathered regarding the book words. So, while a book word becoming a verification word based on your "solutions" will keep your solution rate constant, it actually damages the system when it comes time for humans to solve the CAPTCHA, or worse when the solutions are used as OCR corrections.
To clarify, given a classically OCR-able "foo" and a non-OCR-able-but-human-readable "bar", a human is expected to recognize the slightly-deformed-by-reCAPTCHA "foo" and is trusted to get "bar" right more often than OCR would. This attack only defeats the deformation applied by reCAPTCHA, it doesn't actually improve the OCR on the non-deformed words, which means you are going to submit an answer of "foo ban" every time this pair is encounted (or "blah ban" for a different scenario), and the reCAPTCHA system is eventually going to decide that the book word really is "ban".
Re:Can the mouse cursor be positioned by a script? (Score:3, Insightful)
Remember, iPads and touch-screens can't do hover. Plus there's the whole disability accessibility aspect as well ;)