Slashdot Log In
Defeating Captcha
Posted by
CmdrTaco
on Wed Aug 24, 2005 12:23 PM
from the security-through-irritation dept.
from the security-through-irritation dept.
An anonymous reader pointed us at PWNtcha, a package that breaks various on-line captcha algorithms. The site provides numerous examples of easy (Paypal, and an older version of Slashdot make the list) and hard Captcha. It also links various sources explaining why Captcha is a bad idea.
Related Stories
[+]
Web Users Angered by Anti-Spam 'Captcha' 267 comments
Carl Bialik from WSJ writes "Captchas -- the jumbles of letters that users must type to gain access to some websites -- are a growing irritation, the Wall Street Journal reports. But programmers hope to make new variations that are both easier to decipher and harder to crack. From the article: 'Some captchas have been solved with more than 90% accuracy by scientists specializing in computer vision research at the University of California, Berkeley, and elsewhere. Hobbyists also regularly write code to solve captchas on commercial sites with a high degree of accuracy. ... Henry Baird, a professor of computer science at Lehigh University who studies PC users' responses to the codes, has been working with colleagues to develop new generations of captchas that are designed to be easier on humans but baffling for computers.'"
[+]
Yahoo CAPTCHA Hacked 247 comments
Hell Yeah! reminds us of a 2-week-old development that somehow escaped notice here. A team of Russian hackers has found a way to decipher a Yahoo CAPTCHA, thought to be one of the most difficult, with 35% accuracy. The Russian group's notice, posted by one "John Wane," is dated January 16. This site hosts a rapidshare link to what looks to be demonstration software for Windows, and quotes the Russian researchers: "It's not necessary to achieve high degree of accuracy when designing automated recognition software. The accuracy of 15% is enough when attacker is able to run 100,000 tries per day, taking into the consideration the price of not automated recognition — one cent per one CAPTCHA."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Old news is no news. :-( (Score:5, Informative)
mirrored (Score:5, Informative)
What Captcha is... (Score:5, Informative)
A captcha is a type of challenge-response test used in computing to determine whether or not the user is human.
spammer's low-tech way (Score:5, Interesting)
Mod parent up (Score:4, Interesting)
Re:spammer's low-tech way (Score:4, Insightful)
Next stage: make the captcha Java code that generates the warped image dynamically. Reponse: send the JS to the unwitting human.
Next stage: make the Java code generate the token using something intrinsic to the machine running it (IP, etc, etc). Response: snatch the image from display ram to present to the unwitting human.
Next stage: include in the image information about what the image is for (site, etc). Response: block those parts, or use witting humans who don't care or are otherwise paid (in porn, 3rd-world wages, etc).
You can make it progressively harder, but you can't make it impossible. You might be able to make it hard enough, though.
Re:spammer's low-tech way (Score:5, Insightful)
Take a histogram of... say a hundred random subregious within the image of varying sizes and shapes. Sort colors by the number of these subregions in which they appear. Assume that colors that appear in every block (or above some threshold... say 90%) are background. Replace them all with white. Assume that colors that appear in only some of those blocks are foreground. Replace those colors with black. Do your OCR.
To some extent, you can get around that by masking parts of the text using the same color or by adding chunks of background in the same color, but this is only of limited effectiveness. The only way you can really defeat even the most basic stochastic analysis is by making the color information change from one side of the picture to another. Even then, unless this is done randomly in a dynamic fashion, once you manually figure out the gradation once, the mechanism is broken.
Basically, these things don't work even at a conceptual level. The fundamental problem is that you have a choice: either require the person to do something that doesn't require thought or require the person to solve problems that require logical thought.
In the case of the former, it can be obscured easily, but the level of thought needed can be easily simulated by a computer program, and any algorithm one could write to fool that program is inherently reversible. If the noise level is sufficient to make this impractical, it also will be unlikely that a human can read it, though with multiple tests, this could still work---more on this later..
In the case of the latter, the limitations to the reasonable size of the problem space mean that, while the computer can't simulate the intellect needed to actually figure out the example, it can trivially store a list of all of the problems and their answers and simply regurgitate the right answer on command, in much the same way that most lower animals can be trained to regurgitate an action on command even though they do not actually understand what the command means.
The only potentially viable mechanism for doing this sort of thing involves dynamic creation of the images using random number generators to perturb the image in ways that are of similar color to the test, using color variation on the text to fool stochastic methods, using foreground masking of the text (i.e. lines that go in front of the text, not just behind it), and using a wide enough variety of fonts, some of which should be things like cursive fonts with variable baselines. That really makes OCR mad.
If you do all of those things, you -might- have something that could only be broken by a computer a third of the time. The problem is that it could only be broken by a -human- about half of the time. If you do multiple tests, you should be able to establish a reasonable threshold above which the antagonist is likely to be a human rather than a piece of software, though even then, you will have to algorithmically change it frequently or else computers will eventually overtake humans no matter what your algorithm... because, quite frankly, computers are a lot better at DSP than we are. :-)
rock paper scissors... (Score:5, Funny)
pwntcha breaks captcha
slashdot cremates pwntcha
ADA (Score:5, Insightful)
Is it compatible with BLINUX? I think by definition it is not.
Perhaps I should ask, what alternate method of identification do sights employ to take into account blind users and the ADA?
Re:ADA (Score:5, Interesting)
Re:ADA (Score:5, Funny)
Re:ADA (Score:5, Funny)
Simple - they just use ALT text for the image!
Consider the problem (Score:5, Insightful)
Is this really a good thing, though? Even on a site like Slashdot, in a story about defeating bots, the very first comment in this story is posted by a bot. How ironic is that? What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says. It only serves to annoy legitimate users and does nothing to hamper illegitimate robots.
The solution is not this sort of halfway measure. The solution is to make it simply not worth the effort to be a nuisance on a discussion forum. I suppose that requires a glut of intelligent posters, but with the entire citizenry of the Internet available, that can't be so hard.
Re:Consider the problem (Score:5, Insightful)
I actually disagree. The captcha method reduces spam load for most sites down to zero. Only the bigger sites need to worry, because spammers may set up a site to specifically target them by rerouting captchas. That's not the case with 99% of the websites using captchas, it's just not worth the effort.
It's sorta like a copy protection: if it stops 90% of the people, then it's good enough.
It is patented (Score:4, Informative)
This is a good study of how hard it is to design secure systems. It's just like a non-cryptographer trying to create their own cipher, only in the visual processing world. Sadly, the article does not touch on non-visual captchas, which are alternatives for the blind. It would also be interesting to see what Jakob Nielsen [useit.com] might have to say on this technology from a usability perspective.
Of course, one of the primary bad things is that the concept of a captcha is patented, and the patent language is very broad. US Patent# 6,195,698
Also see the Wikipedia article [wikipedia.org] for more information.
Heh (Score:4, Funny)
Well I'm glad someone is writing code to solve those "prove you aren't a script" images, because a lot of times I can't quite figure them out myself.
Its bad idea for several reasons (Score:5, Insightful)
I can't imagine how people with difficulties cope with this.
Interesting flash-based captcha (Score:5, Interesting)
A little patience was required, but I was able to verify in less than 10 seconds. Animation seems to be very useful for this kind of application.
Re:Interesting flash-based captcha (Score:5, Insightful)
Flash is a bad format to use for a CAPTCHA from a security and accessibility point of view.
Easiest way to Defeat Captchas (Score:4, Interesting)
Captchas are next to useless and for the visually impaired very frustrating. One more of a example of a technology which annoys everyone and yet doesn't really stop the determined miscreant. <cough>airport shoe inspections</cough>
Captchas = Turing test (Score:5, Insightful)
The ultimate irony may occur when the first human-intelligent computer is created by a spammer for the purpose of assaulting our collective intelligences with their commerical drivel. Given the increasing value of online commerce and Google page ranking, there's probably more money in AI for captchas than AI for academic research.
But before captchas get that sophisticated, the system will become self-defeating as the number of real humans defeated by captchas exceeds the number of AIs repelled by them.
The linked page is NSFW (Score:5, Informative)
Please don't link to the goatse man without at least some warning.
Thanks.
Goatse Man (Score:5, Informative)
The link is not work safe.
Totally fake (Score:5, Insightful)
Re:From the site... (Score:5, Informative)
This is what slashdot's previous iteration of a captcha looked like in an in-memory associative array after the intersecting lines had been removed and a de-skewing algorithm applied. There was actually a version of the code after that which properly picked out where the lines actually intersected the letters and didn't erase the intersecting section to create those gaps.
Before they switched to the newest CAPTCHA system, I was breaking their CAPTCHAs with a modified SS.pl script with almost 100% accuracy (it had a little trouble properly splitting up the text when a j or other similar character wrapped partially under another letter).
Of course, the new CAPTCHAs are much harder. I can't even read some of them myself, but the point is that breaking CAPTCHA that people can easily read usually isn't really that hard.
Yes, I used ImageMagick's Perlmagick library.