Defeating Captcha 430
An anonymous reader pointed us at PWNtcha, a package that breaks various on-line captcha algorithms. The site provides numerous examples of easy (Paypal, and an older version of Slashdot make the list) and hard Captcha. It also links various sources explaining why Captcha is a bad idea.
Old news is no news. :-( (Score:5, Informative)
Re:Old news is no news. :-( (Score:3, Interesting)
I doubt it. I'm willing to give him the benefit of the doubt and assume he's just trying to make sure what he's doing is responsible by releassing the code. And what he's doing at this site is mainly pointing out the weaknesses in some common captchas.
Re:Old news is no news. :-( (Score:3, Insightful)
I saw a couple of sites a while that used some captchas that you could barely read, which made it annoying and unusable.
What would make it much more difficult is if they combined captchas with pictures, or ask people a simple question with a captcha that would have a common sense answer. Like "what is 2+2=" and then alternate it with forms like "w
Re:Old news is no news. :-( (Score:4, Interesting)
This slide [w3.org] demonstrates the problem beautifully, I think.
With regard to the simple questions, that is indeed what I do, some simple trivia, and some basic maths, and the library is called SimpleQuestions.
"What colour is the sky?" is actually one of the questions, and the maths question do indeed vary in form, from expression to natural language.
The problem with the drawing requirement is that you're now blocking people who cannot draw.
mirrored (Score:5, Informative)
What Captcha is... (Score:5, Informative)
A captcha is a type of challenge-response test used in computing to determine whether or not the user is human.
Re:What Captcha is... (Score:2, Funny)
It would also have to be impossible for lawyers, tax collectors, marketroids and politicians to use. (Taxes are important, I'm just not convinced anyone in the IRS is biologically related to life on this planet.)
As of this time, Captcha fails this test and therefore is quite unsuitable. A better test would be a short quiz on the meaning of that day's Dilbert cartoon.
Re:What Captcha is... (Score:2)
It would also have to be impossible for lawyers, tax collectors, marketroids and politicians to use. (Taxes are important, I'm just not convinced anyone in the IRS is biologically related to life on this planet.)
You mean like desktop Linux?
Re:What Captcha is... (Score:3, Informative)
Re:What Captcha is... (Score:3, Insightful)
And...they do make GREAT beers!! Strong beers...
Which may in fact, explain the strange mayo on the french fries thing......
Re:What Captcha is... (Score:3, Interesting)
Literally.
Re:What Captcha is... (Score:3, Insightful)
So long as we're talking about beer and not politics, America is fine.
Re:BFD (Score:3, Funny)
Re:What Captcha is... (Score:2)
spammer's low-tech way (Score:5, Interesting)
Mod parent up (Score:4, Interesting)
Re:spammer's low-tech way (Score:3, Informative)
Re:spammer's low-tech way (Score:2)
Re:spammer's low-tech way (Score:3, Insightful)
Re:spammer's low-tech way (Score:4, Funny)
Re:spammer's low-tech way (Score:4, Insightful)
They'll waste their resources trying to spam with the wrong answer, and you'll still get your porn fix.
Re:spammer's low-tech way (Score:2)
Re:spammer's low-tech way (Score:2)
Re:spammer's low-tech way (Score:4, Insightful)
Next stage: make the captcha Java code that generates the warped image dynamically. Reponse: send the JS to the unwitting human.
Next stage: make the Java code generate the token using something intrinsic to the machine running it (IP, etc, etc). Response: snatch the image from display ram to present to the unwitting human.
Next stage: include in the image information about what the image is for (site, etc). Response: block those parts, or use witting humans who don't care or are otherwise paid (in porn, 3rd-world wages, etc).
You can make it progressively harder, but you can't make it impossible. You might be able to make it hard enough, though.
Re:spammer's low-tech way (Score:5, Insightful)
Take a histogram of... say a hundred random subregious within the image of varying sizes and shapes. Sort colors by the number of these subregions in which they appear. Assume that colors that appear in every block (or above some threshold... say 90%) are background. Replace them all with white. Assume that colors that appear in only some of those blocks are foreground. Replace those colors with black. Do your OCR.
To some extent, you can get around that by masking parts of the text using the same color or by adding chunks of background in the same color, but this is only of limited effectiveness. The only way you can really defeat even the most basic stochastic analysis is by making the color information change from one side of the picture to another. Even then, unless this is done randomly in a dynamic fashion, once you manually figure out the gradation once, the mechanism is broken.
Basically, these things don't work even at a conceptual level. The fundamental problem is that you have a choice: either require the person to do something that doesn't require thought or require the person to solve problems that require logical thought.
In the case of the former, it can be obscured easily, but the level of thought needed can be easily simulated by a computer program, and any algorithm one could write to fool that program is inherently reversible. If the noise level is sufficient to make this impractical, it also will be unlikely that a human can read it, though with multiple tests, this could still work---more on this later..
In the case of the latter, the limitations to the reasonable size of the problem space mean that, while the computer can't simulate the intellect needed to actually figure out the example, it can trivially store a list of all of the problems and their answers and simply regurgitate the right answer on command, in much the same way that most lower animals can be trained to regurgitate an action on command even though they do not actually understand what the command means.
The only potentially viable mechanism for doing this sort of thing involves dynamic creation of the images using random number generators to perturb the image in ways that are of similar color to the test, using color variation on the text to fool stochastic methods, using foreground masking of the text (i.e. lines that go in front of the text, not just behind it), and using a wide enough variety of fonts, some of which should be things like cursive fonts with variable baselines. That really makes OCR mad.
If you do all of those things, you -might- have something that could only be broken by a computer a third of the time. The problem is that it could only be broken by a -human- about half of the time. If you do multiple tests, you should be able to establish a reasonable threshold above which the antagonist is likely to be a human rather than a piece of software, though even then, you will have to algorithmically change it frequently or else computers will eventually overtake humans no matter what your algorithm... because, quite frankly, computers are a lot better at DSP than we are. :-)
Re:spammer's low-tech way (Score:4, Informative)
Re:spammer's low-tech way (Score:3, Informative)
rock paper scissors... (Score:5, Funny)
pwntcha breaks captcha
slashdot cremates pwntcha
Rock paper scissors snorkel (Score:2, Insightful)
Re:Rock paper scissors snorkel (Score:2)
So yes, bots break slashdot.
but (Score:2)
Re:rock paper scissors... (Score:2)
-Adam
Re:rock paper scissors... (Score:4, Funny)
Hmm (Score:2, Interesting)
Re:Hmm (Score:2)
If a hobbyist can do it, so can a spammer with financial motivation. Showing weaknesses in Captcha will help sites to develop better systems so the spammers don't have such an easy time with it.
Re:Hmm (Score:3, Interesting)
Well, if you read the site, there's a list of reasons why certain captchas are bad.
For instance:
And a list of reasons why certain captchas are good. It's a pretty good summary of the strengths (and weaknesses) of a lot of them.
One thing you may notice is how complicated (and difficult to read as a human!) some of the broken ones are (like linuxfr.org, or vBulletin)
ADA (Score:5, Insightful)
Is it compatible with BLINUX? I think by definition it is not.
Perhaps I should ask, what alternate method of identification do sights employ to take into account blind users and the ADA?
Re:ADA (Score:3, Interesting)
Re:ADA (Score:5, Interesting)
Re:ADA (Score:2)
> Captcha, I just have the last question on a log in
> screen or form be a randomly selected super easy
> question. For example, "Spell the number 7" or
> "What is the next logical number in the sequence
> 1, 3, 5, 7,
The sad thing is that many humans will have problems solving these trivial puzzles, too. Especially when English is not your first language.
Re:ADA (Score:2)
I agree that if my method were applied to Yahoo Mail signup or eBay or something, then questions would have to be given in different languages.
Re:ADA (Score:5, Funny)
Re:ADA (Score:2, Insightful)
Re:ADA (Score:3, Insightful)
Re:ADA (Score:2, Funny)
"What is the next logical number in the sequence 1, 3, 5, 7, ...?"
11. Oh, wait, you're not using octal?
Re:Prime Numbers? (Score:3, Informative)
For info on why, see the mathworld prime number [wolfram.com] entry.
Interestingly, it says that, at one time, 1 was considered prime and 2 was not. Pretty amazing, considering importance of the Fundamental Theorem of Arithmetic [wolfram.com].
Re:ADA (Score:2)
Re:ADA (Score:2, Interesting)
Unfortunately, not all sites have non-visual humanity tests.
Re:ADA (Score:5, Funny)
Simple - they just use ALT text for the image!
Re:ADA (Score:2, Interesting)
Distored fonts, noisy lines, random dots and low contrast used in such pictures, makes it at least very hard or impossible to read.
Accessibility recommandations and W3C standards would require such important content, to be backuped with alternate formats like an audio record.
I believe these rules should apply not only to government sites.
But, I know no site, providing alternativ audio captcha for now. My husband and I, require a tier person to
Re: Disabilities (Score:2, Informative)
Re:ADA (Score:2)
Of course, things get hairy when you're both blind and deaf... and not all sites offer this kind of alternative, either. But that seems to be part of a more general problem where people don't make their sites accessible, and in fact often don't even realize that there might be a problem for
Re:ADA (Score:2)
Consider the problem (Score:5, Insightful)
Is this really a good thing, though? Even on a site like Slashdot, in a story about defeating bots, the very first comment in this story is posted by a bot. How ironic is that? What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says. It only serves to annoy legitimate users and does nothing to hamper illegitimate robots.
The solution is not this sort of halfway measure. The solution is to make it simply not worth the effort to be a nuisance on a discussion forum. I suppose that requires a glut of intelligent posters, but with the entire citizenry of the Internet available, that can't be so hard.
Re:Consider the problem (Score:2)
Sure, maybe they're not perfect.
I use them on my website [nephadus.com] mostly because I want to avoid people posting advertisements on my blog. Individuals do it occasionaly, but those are easy enough to delete. When someone coded my blog comment form into a bot somewhere and I started getting 100+ spam comments a day I started useing captchas.
I'm su
Re:Consider the problem (Score:2)
Re:Consider the problem (Score:5, Insightful)
I actually disagree. The captcha method reduces spam load for most sites down to zero. Only the bigger sites need to worry, because spammers may set up a site to specifically target them by rerouting captchas. That's not the case with 99% of the websites using captchas, it's just not worth the effort.
It's sorta like a copy protection: if it stops 90% of the people, then it's good enough.
90% is not "good enough" (Score:2)
It is patented (Score:4, Informative)
This is a good study of how hard it is to design secure systems. It's just like a non-cryptographer trying to create their own cipher, only in the visual processing world. Sadly, the article does not touch on non-visual captchas, which are alternatives for the blind. It would also be interesting to see what Jakob Nielsen [useit.com] might have to say on this technology from a usability perspective.
Of course, one of the primary bad things is that the concept of a captcha is patented, and the patent language is very broad. US Patent# 6,195,698
Also see the Wikipedia article [wikipedia.org] for more information.
why Captcha is a bad idea (Score:2)
Usernames and passwords are a bad idea, but we use them. Using cookies or special URLs like slashdot has (or had, not sure) to automatically login is a bad idea.
But they are acceptable for now, relatively simple to implement a
Re:why Captcha is a bad idea (Score:2)
Re:why Captcha is a bad idea (Score:2)
Re:why Captcha is a bad idea (Score:2)
It's not inconcievable that an algorithm to defeat a particular type of captcha would be better at reading it than a human.
Heh (Score:4, Funny)
Well I'm glad someone is writing code to solve those "prove you aren't a script" images, because a lot of times I can't quite figure them out myself.
Re:Heh (Score:2)
Two flies mating? That's obviously Natalie Portman riding a tapir.
Its bad idea for several reasons (Score:5, Insightful)
I can't imagine how people with difficulties cope with this.
Re:Its bad idea for several reasons (Score:3, Interesting)
But there are still a lot of ways that haven't been used yet to make the image hard to read for the computer but less hard than the expreme distortions, such as overlapping letters and words. For example, if say only 25% of a word is covered up by another word on top of it, it should still be very easy for a normal person to read both words. Or use different colors and transparency. Or chain
Re:Its bad idea for several reasons (Score:3, Interesting)
Yahoo! does this and it's asinine. I hit a captcha today that clearly had a ` character in it, but apparently it was a 'confuser' line, not a `. The rules for what character sets are valid are not given, so you don't know if punctuation is valid or not. Apparently it's invalid. How about case? A c and a C are pretty hard to discriminate when they're rendered along a Bezier curve.
Clearing the web
A Necessary evil... (Score:2)
B: ITs how lazy cryptographers do AI: The goal of a captcha is to get someone else to solve a hard vision/learning problem, and then you change the Captcha.
Use Coral ! (Score:2)
OCR wins (Score:3, Funny)
Interesting flash-based captcha (Score:5, Interesting)
A little patience was required, but I was able to verify in less than 10 seconds. Animation seems to be very useful for this kind of application.
Re:Interesting flash-based captcha (Score:5, Insightful)
Flash is a bad format to use for a CAPTCHA from a security and accessibility point of view.
Re:Interesting flash-based captcha (Score:2)
This [flash based thing] is the easiest form of captcha to crack. I bet it would take just a few seconds looking around a flash extractor on CPAN [cpan.org] or something.
Defeating animated Captcha (Score:2)
That is fairly easy to break if the text is stationary - simply keep taking pictures. Once you have enough (probably 10 seconds worth at 3fps) just stack all the images on top of each other and "add" them up. The moving parts will fade into the background and leave the text standing proud for some quick OCR.
Now if the text moved as well, it would
Yet another problem hashcash can solve (Score:2)
It makes bulk spamming expensive as well. That may not apply to blog spamming as much but it's still a good way to slow them down.
Tom
Re:Yet another problem hashcash can solve (Score:3, Funny)
Moneybookers (Score:2)
Stephan
Re:Moneybookers (Score:2)
In case you didn't click the link, this site is secured by a captcha - with horizontal lines. It only uses numbers, however, so it would be really really easy to get through this one....
Try AuthImage for WordPress with a little tweaking (Score:3, Interesting)
Having to wade through 60+ spam comments a day on a WordPress blog (with all the stock antispam options enabled) just sucked . . . and the blog didn't even get much traffic (PageRank of 4). I installed the AuthImage plugin [gudlyf.com] and used it on its stock settings, and for awhile didn't get a single bit of spam. Then, magically, it started up again. It seems some industrious little script kiddies have written a crawler to massively bombard AuthImage-enabled blogs with words from the stock word list. I switched from the wordlist file to randomly-generated strings and increased the size of the image for readability, and I never had another piece of comment spam in that blog again.
As for blind folks, I suppose every webmaster has to make that decision based on their target demographic, but I've seen a few text-only captchas that work well enough ("What color is an orange?") but will inevitably have the same limitation as the AuthImage word list above.
Easiest way to Defeat Captchas (Score:4, Interesting)
Captchas are next to useless and for the visually impaired very frustrating. One more of a example of a technology which annoys everyone and yet doesn't really stop the determined miscreant. <cough>airport shoe inspections</cough>
That's why (Score:3, Interesting)
all captchas should timeout after, oh, say 10 minutes?
In all honesty, do you really think you're going to get that many people to regularly visit a pr0n site? The sector is extreemly cut-throat and vastly bigger than the market can justifiably support (hence why many pr0n sites close each month).
The only way to get to the top of the engines in the first few months would be to use PPC advertising (costs money). After that, even if you get to the top of the SERPS by using nefarious means, you'll need to g
Captchas = Turing test (Score:5, Insightful)
The ultimate irony may occur when the first human-intelligent computer is created by a spammer for the purpose of assaulting our collective intelligences with their commerical drivel. Given the increasing value of online commerce and Google page ranking, there's probably more money in AI for captchas than AI for academic research.
But before captchas get that sophisticated, the system will become self-defeating as the number of real humans defeated by captchas exceeds the number of AIs repelled by them.
Using Captcha for distributed processing (Score:2)
This way, not only does it take a little longer to analyze, but you get them to do a little bit of work for you. Force the spammers to be part of your little distributed processing system.
Of course the problems need to be simple enough for the users to figure out...
Re:Using Captcha for distributed processing (Score:2)
What's a human going to do when presented with sin(34)+10? Find a computer.
Commentary on w3's captcha-inaccessibility page (Score:2, Informative)
Among the claims:
- captchas are inaccessbile to the blind - true
- a horde of human beings can decode the entire library over time - only true if the images are recycled, not if they are created on-demand or for one-time use.
It also discusses some of the side-effects of making access to real humans harder, or harder for a class of users such as the visually impaired. For example, I've seen s
Re:Commentary on w3's captcha-inaccessibility page (Score:2)
> - sound output
That's fine for the blind, but now you are breaking it for the deaf and dumb.
-G
Is that goatse I see? (Score:3, Funny)
Re:Is that goatse I see? (Score:2)
Is one of those pics the goat guy? (Score:2)
The linked page is NSFW (Score:5, Informative)
Please don't link to the goatse man without at least some warning.
Thanks.
Goatse Man (Score:5, Informative)
The link is not work safe.
Re:Goatse Man (Score:4, Funny)
Here's an idea (Score:3, Interesting)
1) Get (or construct) a large database of nouns of well-known objects (car, orange, bottle, phone, pencil, brick, cup, etc. etc.)
2) Retrieve image references from a (safesearch-enabled) Google image search for a random noun from your database. Pick randomly from the result set.
3) Present images to the user. "These are pictures of a..."
4) My next strategy was to figure out a combinatorial way to increase the number of possible replies so that an attacker couldn't simply create a database of knowns (such as a hash database of images)
What do you smart fellers think? other than google being pissed for scraping their site
Totally fake (Score:5, Insightful)
Re:From the site... (Score:3, Insightful)
Not that this would stop you from plugging some random open-source software package. Even though your plug will probably do
Re:From the site... (Score:2, Insightful)
> uses for image import, does it?
I'd be interested in knowing what it is... but I may well be the only person on the planet that is interested.
> your motives are still strange to me
Most of the time I don't understand them myself!
Re:From the site... (Score:5, Informative)
This is what slashdot's previous iteration of a captcha looked like in an in-memory associative array after the intersecting lines had been removed and a de-skewing algorithm applied. There was actually a version of the code after that which properly picked out where the lines actually intersected the letters and didn't erase the intersecting section to create those gaps.
Before they switched to the newest CAPTCHA system, I was breaking their CAPTCHAs with a modified SS.pl script with almost 100% accuracy (it had a little trouble properly splitting up the text when a j or other similar character wrapped partially under another letter).
Of course, the new CAPTCHAs are much harder. I can't even read some of them myself, but the point is that breaking CAPTCHA that people can easily read usually isn't really that hard.
Yes, I used ImageMagick's Perlmagick library.
Re:From the site... (Score:4, Informative)
About 3/4ths down the page there is a goatse picture, and the caption at the top thanks the GNAA. Wake up slashdot.
Re:The GOATSE picture is NOT in the mirrordot (Score:3, Informative)