Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Security The Internet

Defeating Captcha 430

An anonymous reader pointed us at PWNtcha, a package that breaks various on-line captcha algorithms. The site provides numerous examples of easy (Paypal, and an older version of Slashdot make the list) and hard Captcha. It also links various sources explaining why Captcha is a bad idea.
This discussion has been archived. No new comments can be posted.

Defeating Captcha

Comments Filter:
  • ADA (Score:5, Insightful)

    by dnoyeb ( 547705 ) on Wednesday August 24, 2005 @01:31PM (#13390689) Homepage Journal
    Having a legally blind mother that uses the web, I wonder how captcha complies with the Americans With Disabilities Act (when used by American companies of course)?

    Is it compatible with BLINUX? I think by definition it is not.

    Perhaps I should ask, what alternate method of identification do sights employ to take into account blind users and the ADA?
  • by ReformedExCon ( 897248 ) <reformed.excon@gmail.com> on Wednesday August 24, 2005 @01:31PM (#13390690)
    The problem is that people are using robots to work in an autonomous manner to find ways around typical human limitations (we can only send several hundred emails a day, robots are not so limited). So people want to stop these "cheater" by making the user prove that they are a human rather than a robot.

    Is this really a good thing, though? Even on a site like Slashdot, in a story about defeating bots, the very first comment in this story is posted by a bot. How ironic is that? What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says. It only serves to annoy legitimate users and does nothing to hamper illegitimate robots.

    The solution is not this sort of halfway measure. The solution is to make it simply not worth the effort to be a nuisance on a discussion forum. I suppose that requires a glut of intelligent posters, but with the entire citizenry of the Internet available, that can't be so hard.
  • by Wilson_6500 ( 896824 ) on Wednesday August 24, 2005 @01:33PM (#13390713)
    Uh, that game doesn't work unless, say, bots stop Slashdot. Otherwise everyone just picks Slashdot and it's fifth grade all over again.
  • by bogie ( 31020 ) on Wednesday August 24, 2005 @01:38PM (#13390751) Journal
    Chiefly among them is sometimes you can't tell what the fucking words are. Within the last few months on more than one occasion I simply could not read the letters because they were so distorted and the lines overlapped the letters too much. No fun redoing a web form over and over because you can't figure out what the hell the verification box says.

    I can't imagine how people with difficulties cope with this.
  • by Anonymous Coward on Wednesday August 24, 2005 @01:40PM (#13390777)
    And then again, maybe he isn't. It doesn't really matter which library he uses for image import, does it? I mean, the interesting part would be the data structures and algorithms used in the "reverse-mapping" from image data to text. It's doubtful that the rudimentary processing methods provided by ImageMagick (although often a god-send of convenience and compatibility) would help here.

    Not that this would stop you from plugging some random open-source software package. Even though your plug will probably do more Good-For-The-World than the rest of the discussion in this thread combined, your motives are still strange to me.
  • by tcopeland ( 32225 ) <tom&thomasleecopeland,com> on Wednesday August 24, 2005 @01:47PM (#13390845) Homepage
    > It doesn't really matter which library he
    > uses for image import, does it?

    I'd be interested in knowing what it is... but I may well be the only person on the planet that is interested.

    > your motives are still strange to me

    Most of the time I don't understand them myself!
  • by jesup ( 8690 ) * <randellslashdot&jesup,org> on Wednesday August 24, 2005 @01:51PM (#13390899) Homepage
    It's trivial to hack a browser (hell, you don't even have to actually hack it, just know how it works) to snag the image for you. Then repeat as per grandparent (have a unwitting (or witting) human do it for you).

    Next stage: make the captcha Java code that generates the warped image dynamically. Reponse: send the JS to the unwitting human.

    Next stage: make the Java code generate the token using something intrinsic to the machine running it (IP, etc, etc). Response: snatch the image from display ram to present to the unwitting human.

    Next stage: include in the image information about what the image is for (site, etc). Response: block those parts, or use witting humans who don't care or are otherwise paid (in porn, 3rd-world wages, etc).

    You can make it progressively harder, but you can't make it impossible. You might be able to make it hard enough, though.
  • by G4from128k ( 686170 ) on Wednesday August 24, 2005 @01:54PM (#13390928)
    As with the Turing test, the entire purpose of a captcha is to distinguish humans from machines. As captcha-defeaters improve, the captchas will need to become more and more sophisticated and require more and more human or human-like intelligence to process. This arms race will culminate in a Turing test-like approach for discerning natural intelligences from artificial ones.

    The ultimate irony may occur when the first human-intelligent computer is created by a spammer for the purpose of assaulting our collective intelligences with their commerical drivel. Given the increasing value of online commerce and Google page ranking, there's probably more money in AI for captchas than AI for academic research.

    But before captchas get that sophisticated, the system will become self-defeating as the number of real humans defeated by captchas exceeds the number of AIs repelled by them.
  • by JimmehAH ( 817552 ) <slashdot@j-a-h.co.uk> on Wednesday August 24, 2005 @01:55PM (#13390935) Homepage
    You could just write the bot to decompile the .swf file and grab the string (or vector/raster representation of the text) from that.

    Flash is a bad format to use for a CAPTCHA from a security and accessibility point of view.
  • by A beautiful mind ( 821714 ) on Wednesday August 24, 2005 @01:57PM (#13390951)
    "What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says."

    I actually disagree. The captcha method reduces spam load for most sites down to zero. Only the bigger sites need to worry, because spammers may set up a site to specifically target them by rerouting captchas. That's not the case with 99% of the websites using captchas, it's just not worth the effort.

    It's sorta like a copy protection: if it stops 90% of the people, then it's good enough.
  • Re:ADA (Score:2, Insightful)

    by JadeNB ( 784349 ) on Wednesday August 24, 2005 @02:11PM (#13391075) Homepage
    This solution is interesting, but surely not scaleable -- while captchas are, by design, easy for computers to generate but hard for them to solve, the same thing that prevents computers from solving `easy' problems will presumably also prevent them from generating `easy' problems.
  • Re:ADA (Score:3, Insightful)

    by aardvarkjoe ( 156801 ) on Wednesday August 24, 2005 @02:16PM (#13391118)
    For fun, I tried plugging five questions from your page into google. Of the five, three were answered directly by google, and one had the answer in the summary for the first result. Creating a parser to determine the right answer from the google results would take some work, but I would bet that a 50% accuracy rate is not unreasonable. A first, fairly obvious method, would be to take the summary of the first google result, remove all of the words that appeared in the original question, and pick from the remaining words.

    Of course, as long as your system isn't widely used, nobody will bother to create tools to defeat it.

  • by Drooling Iguana ( 61479 ) on Wednesday August 24, 2005 @02:22PM (#13391189)
    By making everyone so pissed off at the state of the computer industry that they go back to using an abacus and slide rule?
  • by dgatwood ( 11270 ) on Wednesday August 24, 2005 @02:35PM (#13391328) Homepage Journal
    Most of these techniques could be defeated with a simple color filter, sadly.... Regardless, crypto is a really good comparison because a lot of crypto can be broken with statistical techniques, and in that regard, getting past Captcha grids can be done using very similar methodology.

    Take a histogram of... say a hundred random subregious within the image of varying sizes and shapes. Sort colors by the number of these subregions in which they appear. Assume that colors that appear in every block (or above some threshold... say 90%) are background. Replace them all with white. Assume that colors that appear in only some of those blocks are foreground. Replace those colors with black. Do your OCR.

    To some extent, you can get around that by masking parts of the text using the same color or by adding chunks of background in the same color, but this is only of limited effectiveness. The only way you can really defeat even the most basic stochastic analysis is by making the color information change from one side of the picture to another. Even then, unless this is done randomly in a dynamic fashion, once you manually figure out the gradation once, the mechanism is broken.

    Basically, these things don't work even at a conceptual level. The fundamental problem is that you have a choice: either require the person to do something that doesn't require thought or require the person to solve problems that require logical thought.

    In the case of the former, it can be obscured easily, but the level of thought needed can be easily simulated by a computer program, and any algorithm one could write to fool that program is inherently reversible. If the noise level is sufficient to make this impractical, it also will be unlikely that a human can read it, though with multiple tests, this could still work---more on this later..

    In the case of the latter, the limitations to the reasonable size of the problem space mean that, while the computer can't simulate the intellect needed to actually figure out the example, it can trivially store a list of all of the problems and their answers and simply regurgitate the right answer on command, in much the same way that most lower animals can be trained to regurgitate an action on command even though they do not actually understand what the command means.

    The only potentially viable mechanism for doing this sort of thing involves dynamic creation of the images using random number generators to perturb the image in ways that are of similar color to the test, using color variation on the text to fool stochastic methods, using foreground masking of the text (i.e. lines that go in front of the text, not just behind it), and using a wide enough variety of fonts, some of which should be things like cursive fonts with variable baselines. That really makes OCR mad.

    If you do all of those things, you -might- have something that could only be broken by a computer a third of the time. The problem is that it could only be broken by a -human- about half of the time. If you do multiple tests, you should be able to establish a reasonable threshold above which the antagonist is likely to be a human rather than a piece of software, though even then, you will have to algorithmically change it frequently or else computers will eventually overtake humans no matter what your algorithm... because, quite frankly, computers are a lot better at DSP than we are. :-)

  • by drgonzo59 ( 747139 ) on Wednesday August 24, 2005 @02:54PM (#13391509)
    The problem with with captcha stuff is that if it is so good that if the current OCR cannot read it, it is probably bad enough that even humans cannot read it.

    I saw a couple of sites a while that used some captchas that you could barely read, which made it annoying and unusable.

    What would make it much more difficult is if they combined captchas with pictures, or ask people a simple question with a captcha that would have a common sense answer. Like "what is 2+2=" and then alternate it with forms like "what is two plus two equal to" and such, combine such questions with stuff like "what color is the sky?" or "what is the 1st derivative of x^n with respect to x"... well, ok, maybe not this one...

    Or how about blending images together. For example a picture of a dog and a cat on some background, also both transperenlty super-imposed with a small overlap. Then ask the question name the two animals in the picture?

    How about asking the user to make a mouse gesture in an applet. (Did someone already implement this?). For example: "draw a circle with a small triangle in the middle" or "draw number '4'", then let the server use OCR to validate.

  • by Elwood P Dowd ( 16933 ) <judgmentalist@gmail.com> on Wednesday August 24, 2005 @04:39PM (#13392382) Journal
    Dunno what grandparent's problem is, but there's plenty of good beer here in the US too. We don't judge Belgian beer by Stella Artois, and y'all don't need to judge ours by Budweiser.

    So long as we're talking about beer and not politics, America is fine.
  • by cayenne8 ( 626475 ) on Wednesday August 24, 2005 @04:46PM (#13392428) Homepage Journal
    "That's the joke. Belgium is a very pleasant, mostly harmless country..on the whole Belgians themselves are extremely polite, well mannered..."

    And...they do make GREAT beers!! Strong beers...

    Which may in fact, explain the strange mayo on the french fries thing......

    :-)

  • by McGregorMortis ( 536146 ) on Wednesday August 24, 2005 @05:11PM (#13392600)
    The thing is, then, the porn site asking you to solve the captcha doesn't know the answer themselves. You can screw 'em by giving the wrong answer.

    They'll waste their resources trying to spam with the wrong answer, and you'll still get your porn fix.
  • Totally fake (Score:5, Insightful)

    by VAXGeek ( 3443 ) on Wednesday August 24, 2005 @10:07PM (#13394458) Homepage
    This article is a fraud. No source is presented, and goatse.cx is displayed in the examples. This whole thing was contrived just to get goatse.cx in a legitimate front page post. Best troll in years.

An Ada exception is when a routine gets in trouble and says 'Beam me up, Scotty'.

Working...