Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Security The Internet

Defeating Captcha 430

An anonymous reader pointed us at PWNtcha, a package that breaks various on-line captcha algorithms. The site provides numerous examples of easy (Paypal, and an older version of Slashdot make the list) and hard Captcha. It also links various sources explaining why Captcha is a bad idea.
This discussion has been archived. No new comments can be posted.

Defeating Captcha

Comments Filter:
  • by XorNand ( 517466 ) * on Wednesday August 24, 2005 @12:25PM (#13390610)
    # Q. Where is the code? # A. No code is available yet. I am still pondering the pertinence of allowing code in the wild. The good old full-disclosure debate... If you think I should release the code for PWNtcha, feel free to explain your arguments to me.
    ::sigh:: The blurb leads one to believe that there's a new script kiddie tool in the wild. This is just someone's experiment with OCR and some AI. (And an old project at that; I remember reading this site about six months ago while working on my own Captcha implementation). There's a handful of researchers around the world doing the same type of work, including at team at UC Berkeley that devised a system [berkeley.edu] that they claimed was 92% accurate... back in 2003. All in all, this isn't all that newsworthy.
    • by Cujo ( 19106 ) *
      The blurb leads one to believe that there's a new script kiddie tool in the wild.

      I doubt it. I'm willing to give him the benefit of the doubt and assume he's just trying to make sure what he's doing is responsible by releassing the code. And what he's doing at this site is mainly pointing out the weaknesses in some common captchas.

    • The problem with with captcha stuff is that if it is so good that if the current OCR cannot read it, it is probably bad enough that even humans cannot read it.

      I saw a couple of sites a while that used some captchas that you could barely read, which made it annoying and unusable.

      What would make it much more difficult is if they combined captchas with pictures, or ask people a simple question with a captcha that would have a common sense answer. Like "what is 2+2=" and then alternate it with forms like "w

      • by feargal ( 99776 ) on Wednesday August 24, 2005 @02:35PM (#13391857) Homepage
        The problem with blending images and so on is that blind people still cannot see them.
        This slide [w3.org] demonstrates the problem beautifully, I think.

        With regard to the simple questions, that is indeed what I do, some simple trivia, and some basic maths, and the library is called SimpleQuestions.

        "What colour is the sky?" is actually one of the questions, and the maths question do indeed vary in form, from expression to natural language.

        The problem with the drawing requirement is that you're now blocking people who cannot draw.
  • mirrored (Score:5, Informative)

    by Anonymous Coward on Wednesday August 24, 2005 @12:25PM (#13390617)
    here [mirrordot.org]
  • What Captcha is... (Score:5, Informative)

    by geders ( 206556 ) on Wednesday August 24, 2005 @12:27PM (#13390642)
    Whew, I had never even heard of Captcha [wikipedia.org] before...

    A captcha is a type of challenge-response test used in computing to determine whether or not the user is human.
    • A test for humanness will not be convincing until it cuts out 70% of AOL users and 58.2% of Belgium. (58.2% of Belgian users would work, too.)

      It would also have to be impossible for lawyers, tax collectors, marketroids and politicians to use. (Taxes are important, I'm just not convinced anyone in the IRS is biologically related to life on this planet.)

      As of this time, Captcha fails this test and therefore is quite unsuitable. A better test would be a short quiz on the meaning of that day's Dilbert cartoon.

      • A test for humanness will not be convincing until it cuts out 70% of AOL users and 58.2% of Belgium. (58.2% of Belgian users would work, too.)

        It would also have to be impossible for lawyers, tax collectors, marketroids and politicians to use. (Taxes are important, I'm just not convinced anyone in the IRS is biologically related to life on this planet.)


        You mean like desktop Linux?
      • I can understand AOL users, but... Belgians? Huh? Why Belgians? I've been to Belgium, and it's actually a very nice country with very nice (in general) people. Or are there any cliches I'm not aware of?
    • Once again, the brilliant slashdot editors show their skills in explaining clearly what the fuck the article is about.
  • by Anonymous Coward on Wednesday August 24, 2005 @12:28PM (#13390646)
    A while ago, I remember hearing about how some spammers whould post the Yahoo Mail (or other free email services) Captchas on the registration forms on pr0n sites. The pr0n registrants would have to fill out the Captcha, but this would then be used by the spammer to get around the Captcha without any fancy software.
    • Mod parent up (Score:4, Interesting)

      by XNormal ( 8617 ) on Wednesday August 24, 2005 @12:32PM (#13390695) Homepage
      It's a cheap and scaleable method to defeat such algorithms. There will always be enough humans willing to do this for very little reward (some free pics).
    • The best part is that *no* advance in captcha technology can really fix this. It's no longer a race against OCR technology, the whole can't be plugged by switching to object-based (rather than text based), neither can it be stopped by switching to audio-based captcha.
    • by Gordonjcp ( 186804 ) on Wednesday August 24, 2005 @01:30PM (#13391270) Homepage
      It's very difficult to get around this. Even using things with no text at all, such as the Cwazymail images, you still have this gaping hole that ne'er-do-wells will get in through.
    • by McGregorMortis ( 536146 ) on Wednesday August 24, 2005 @04:11PM (#13392600)
      The thing is, then, the porn site asking you to solve the captcha doesn't know the answer themselves. You can screw 'em by giving the wrong answer.

      They'll waste their resources trying to spam with the wrong answer, and you'll still get your porn fix.
  • by jpellino ( 202698 ) on Wednesday August 24, 2005 @12:29PM (#13390667)
    captcha stops bots
    pwntcha breaks captcha
    slashdot cremates pwntcha
  • Hmm (Score:2, Interesting)

    by sexyrexy ( 793497 )
    While it is an interesting project from a hobbyist and academic standpoint, I'm not really sure what practical value it holds (unless the intent is to sell a mature algorithm to spammers, which is not the case since the project is being published). This is nothing more than a personal scripting project - no new forray into new concepts of computer science or pattern recognition; no new breakthroughs of computer-based heuristics.
    • by Phroggy ( 441 ) *
      While it is an interesting project from a hobbyist and academic standpoint, I'm not really sure what practical value it holds

      If a hobbyist can do it, so can a spammer with financial motivation. Showing weaknesses in Captcha will help sites to develop better systems so the spammers don't have such an easy time with it.
    • Re:Hmm (Score:3, Interesting)

      by barawn ( 25691 )
      I'm not really sure what practical value it holds

      Well, if you read the site, there's a list of reasons why certain captchas are bad.

      For instance:
      • Too few fonts (or only one font)
      • Constant rotation (or no rotation)
      • No deformation
      • Constant colors

      And a list of reasons why certain captchas are good. It's a pretty good summary of the strengths (and weaknesses) of a lot of them.

      One thing you may notice is how complicated (and difficult to read as a human!) some of the broken ones are (like linuxfr.org, or vBulletin)

  • ADA (Score:5, Insightful)

    by dnoyeb ( 547705 ) on Wednesday August 24, 2005 @12:31PM (#13390689) Homepage Journal
    Having a legally blind mother that uses the web, I wonder how captcha complies with the Americans With Disabilities Act (when used by American companies of course)?

    Is it compatible with BLINUX? I think by definition it is not.

    Perhaps I should ask, what alternate method of identification do sights employ to take into account blind users and the ADA?
    • Re:ADA (Score:3, Interesting)

      by jpatters ( 883 )
      Audio captchas?
    • Re:ADA (Score:5, Interesting)

      by donnyspi ( 701349 ) <junk5&donnyspi,com> on Wednesday August 24, 2005 @12:37PM (#13390748) Homepage
      Instead of an image based Turing test like Captcha, I just have the last question on a log in screen or form be a randomly selected super easy question. For example, "Spell the number 7" or "What is the next logical number in the sequence 1, 3, 5, 7, ...? Check it out here: http://www.donnyspi.com/contact.php [donnyspi.com]
      • by perrin ( 891 )
        > Instead of an image based Turing test like
        > Captcha, I just have the last question on a log in
        > screen or form be a randomly selected super easy
        > question. For example, "Spell the number 7" or
        > "What is the next logical number in the sequence
        > 1, 3, 5, 7, ...?

        The sad thing is that many humans will have problems solving these trivial puzzles, too. Especially when English is not your first language.
        • If people are having trouble solving these puzzles, then they're probably not getting too much out of my website anyway and would be less likely to using the protected form to leave me a comment or email.

          I agree that if my method were applied to Yahoo Mail signup or eBay or something, then questions would have to be given in different languages.

      • Re:ADA (Score:5, Funny)

        by TheRaven64 ( 641858 ) on Wednesday August 24, 2005 @12:57PM (#13390957) Journal
        Hmm. Done right, you could weed out bots and stupid people. Excellent!
      • Re:ADA (Score:2, Insightful)

        by JadeNB ( 784349 )
        This solution is interesting, but surely not scaleable -- while captchas are, by design, easy for computers to generate but hard for them to solve, the same thing that prevents computers from solving `easy' problems will presumably also prevent them from generating `easy' problems.
      • Re:ADA (Score:3, Insightful)

        by aardvarkjoe ( 156801 )
        For fun, I tried plugging five questions from your page into google. Of the five, three were answered directly by google, and one had the answer in the summary for the first result. Creating a parser to determine the right answer from the google results would take some work, but I would bet that a 50% accuracy rate is not unreasonable. A first, fairly obvious method, would be to take the summary of the first google result, remove all of the words that appeared in the original question, and pick from the
      • Re:ADA (Score:2, Funny)

        by Anonymous Coward

        "What is the next logical number in the sequence 1, 3, 5, 7, ...?"

        11. Oh, wait, you're not using octal?

    • some sites have an alternative audio captcha, or instructions to email the admin for an override of the captcha
    • Re:ADA (Score:2, Interesting)

      by guardian-ct ( 105061 )
      Livejournal has a "If you can't read the text, type "AUDIO" and take a sound test instead." thing, and other sites have other ways around the visual test.

      Unfortunately, not all sites have non-visual humanity tests.
    • Re:ADA (Score:5, Funny)

      by Tumbleweed ( 3706 ) * on Wednesday August 24, 2005 @12:53PM (#13390918)
      I wonder how captcha complies with the Americans With Disabilities Act

      Simple - they just use ALT text for the image! :)
    • Re:ADA (Score:2, Interesting)

      by La Gris ( 531858 )
      This is a real problem for visualy impaired and not only blinds.

      Distored fonts, noisy lines, random dots and low contrast used in such pictures, makes it at least very hard or impossible to read.

      Accessibility recommandations and W3C standards would require such important content, to be backuped with alternate formats like an audio record.

      I believe these rules should apply not only to government sites.

      But, I know no site, providing alternativ audio captcha for now. My husband and I, require a tier person to
    • Re: Disabilities (Score:2, Informative)

      by chato ( 74296 )
      The W3C proposed in 2003 a number of Solutions for the Inaccessibility of Visually-Oriented Anti-Robot Tests [w3.org], including logic puzzles, audio captchas, credit card validation, etc. It is interesting that they also show how a federated identity system can help users with disabilities.
    • Many sites will offer alternatives for that - things along the lines of "download a sound file and type in what it says". It's essentially the same thing as a captcha, when you think about it, only that it's not an image.

      Of course, things get hairy when you're both blind and deaf... and not all sites offer this kind of alternative, either. But that seems to be part of a more general problem where people don't make their sites accessible, and in fact often don't even realize that there might be a problem for
  • by ReformedExCon ( 897248 ) <reformed.excon@gmail.com> on Wednesday August 24, 2005 @12:31PM (#13390690)
    The problem is that people are using robots to work in an autonomous manner to find ways around typical human limitations (we can only send several hundred emails a day, robots are not so limited). So people want to stop these "cheater" by making the user prove that they are a human rather than a robot.

    Is this really a good thing, though? Even on a site like Slashdot, in a story about defeating bots, the very first comment in this story is posted by a bot. How ironic is that? What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says. It only serves to annoy legitimate users and does nothing to hamper illegitimate robots.

    The solution is not this sort of halfway measure. The solution is to make it simply not worth the effort to be a nuisance on a discussion forum. I suppose that requires a glut of intelligent posters, but with the entire citizenry of the Internet available, that can't be so hard.
    • Even if Capchas are broken in, say, 1 second by this system - we have greatly raised the cost of sending an email, posting a blog-spam comment, or some other such irritant.

      Sure, maybe they're not perfect.

      I use them on my website [nephadus.com] mostly because I want to avoid people posting advertisements on my blog. Individuals do it occasionaly, but those are easy enough to delete. When someone coded my blog comment form into a bot somewhere and I started getting 100+ spam comments a day I started useing captchas.

      I'm su
    • by A beautiful mind ( 821714 ) on Wednesday August 24, 2005 @12:57PM (#13390951)
      "What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says."

      I actually disagree. The captcha method reduces spam load for most sites down to zero. Only the bigger sites need to worry, because spammers may set up a site to specifically target them by rerouting captchas. That's not the case with 99% of the websites using captchas, it's just not worth the effort.

      It's sorta like a copy protection: if it stops 90% of the people, then it's good enough.
      • But, and this is probably your point, it's better than nothing! Or, to put another way, if it stops 90% of the people, then it's probably worth its minor cost. (Cost being the effort of humans to read the captchas, etc.)
  • It is patented (Score:4, Informative)

    by dmeranda ( 120061 ) on Wednesday August 24, 2005 @12:34PM (#13390717) Homepage

    This is a good study of how hard it is to design secure systems. It's just like a non-cryptographer trying to create their own cipher, only in the visual processing world. Sadly, the article does not touch on non-visual captchas, which are alternatives for the blind. It would also be interesting to see what Jakob Nielsen [useit.com] might have to say on this technology from a usability perspective.

    Of course, one of the primary bad things is that the concept of a captcha is patented, and the patent language is very broad. US Patent# 6,195,698

    Also see the Wikipedia article [wikipedia.org] for more information.

  • Its a good enough idea. Even with a captcha defeating library, a fairly skilled person would have to write a script or something to parse the webform (optionally over SSL which is a little more difficult) and programatically decode the captcha and then fill in the form and submit it.

    Usernames and passwords are a bad idea, but we use them. Using cookies or special URLs like slashdot has (or had, not sure) to automatically login is a bad idea.

    But they are acceptable for now, relatively simple to implement a
    • How exactly would parsing a form "over SSL" be harder than parsing it not over SSL? Are you trying to claim the SSL adds encryption to the form? It doesn't. SSL is transport layer, you're talking application layer. If you mean snooping it, then I challenge you to show a non-brute force implementation of breaking SSL, so its not "a little more difficult", its exceptionally more.
    • Hell, I've seen captchas that I could not read before, and I'm a human!

      It's not inconcievable that an algorithm to defeat a particular type of captcha would be better at reading it than a human.
  • Heh (Score:4, Funny)

    by hungrygrue ( 872970 ) on Wednesday August 24, 2005 @12:35PM (#13390730) Homepage

    Well I'm glad someone is writing code to solve those "prove you aren't a script" images, because a lot of times I can't quite figure them out myself.

    • "Q. What is your favorite color?.. No on second thought, nevermind that. What is written in this blob?"
    • A. I'm not sure, is this a rorschach test? Oh, I know 4 - 3 - Two flies mating - U - V - Giant Nose - X."
  • by bogie ( 31020 ) on Wednesday August 24, 2005 @12:38PM (#13390751) Journal
    Chiefly among them is sometimes you can't tell what the fucking words are. Within the last few months on more than one occasion I simply could not read the letters because they were so distorted and the lines overlapped the letters too much. No fun redoing a web form over and over because you can't figure out what the hell the verification box says.

    I can't imagine how people with difficulties cope with this.
    • The sites with really good captcha's should run anti-captcha's... to filter out the *reallly* hard to read ones. =P

      But there are still a lot of ways that haven't been used yet to make the image hard to read for the computer but less hard than the expreme distortions, such as overlapping letters and words. For example, if say only 25% of a word is covered up by another word on top of it, it should still be very easy for a normal person to read both words. Or use different colors and transparency. Or chain
    • No fun redoing a web form over and over because you can't figure out what the hell the verification box says.

      Yahoo! does this and it's asinine. I hit a captcha today that clearly had a ` character in it, but apparently it was a 'confuser' line, not a `. The rules for what character sets are valid are not given, so you don't know if punctuation is valid or not. Apparently it's invalid. How about case? A c and a C are pretty hard to discriminate when they're rendered along a Bezier curve.

      Clearing the web
  • A: Captchas are a necessary evil. Without it, many services can be horribly, horribly abused.

    B: ITs how lazy cryptographers do AI: The goal of a captcha is to get someone else to solve a hard vision/learning problem, and then you change the Captcha.
  • Here's a link that will actually load and show you all the pretty pictures : http://sam.zoy.org.nyud.net:8090/pwntcha/ [nyud.net].
  • OCR wins (Score:3, Funny)

    by marked23 ( 693822 ) on Wednesday August 24, 2005 @12:40PM (#13390780)
    Once all these new algorithms get integrated into OCR software... OCR software might just work.
  • I just saw a great flash-based Captcha designed to combat just this sort of attack. The test was composed of white text on a white background. Colored shapes of various sizes swirled in the background behind the text in a pseudo-random pattern, and the text was visible or obfuscated depending on whether there was a shape behind it at the moment. After watching it for a few minutes to see if there were any obvious flaws, I noticed that the entire phrase was never visible all at once.

    A little patience was required, but I was able to verify in less than 10 seconds. Animation seems to be very useful for this kind of application.
  • Hashcash doesn't care if you're blind and need special screen reading software.

    It makes bulk spamming expensive as well. That may not apply to blog spamming as much but it's still a good way to slow them down.

    Tom
  • He lists paypal.com as "broken"; how about https://www.moneybookers.com/app/login.pl [moneybookers.com]

    Stephan
    • Holy hell, and just numbers to boot!

      In case you didn't click the link, this site is secured by a captcha - with horizontal lines. It only uses numbers, however, so it would be really really easy to get through this one....
  • by PeeAitchPee ( 712652 ) on Wednesday August 24, 2005 @12:52PM (#13390908)

    Having to wade through 60+ spam comments a day on a WordPress blog (with all the stock antispam options enabled) just sucked . . . and the blog didn't even get much traffic (PageRank of 4). I installed the AuthImage plugin [gudlyf.com] and used it on its stock settings, and for awhile didn't get a single bit of spam. Then, magically, it started up again. It seems some industrious little script kiddies have written a crawler to massively bombard AuthImage-enabled blogs with words from the stock word list. I switched from the wordlist file to randomly-generated strings and increased the size of the image for readability, and I never had another piece of comment spam in that blog again.

    As for blind folks, I suppose every webmaster has to make that decision based on their target demographic, but I've seen a few text-only captchas that work well enough ("What color is an orange?") but will inevitably have the same limitation as the AuthImage word list above.

  • by Bondolo ( 14225 ) on Wednesday August 24, 2005 @12:53PM (#13390917) Homepage
    1. Put up a "free" pr0n site.
    2. Require visitors to the pr0n site to process a captcha before viewing the pr0n. In reality they are proxy processing a captcha for another site (paypal, hotmail, yahoo, etc.) which they never see.
    3. Profit!

    Captchas are next to useless and for the visually impaired very frustrating. One more of a example of a technology which annoys everyone and yet doesn't really stop the determined miscreant. <cough>airport shoe inspections</cough>

    • That's why (Score:3, Interesting)

      by Phil John ( 576633 )

      all captchas should timeout after, oh, say 10 minutes?

      In all honesty, do you really think you're going to get that many people to regularly visit a pr0n site? The sector is extreemly cut-throat and vastly bigger than the market can justifiably support (hence why many pr0n sites close each month).

      The only way to get to the top of the engines in the first few months would be to use PPC advertising (costs money). After that, even if you get to the top of the SERPS by using nefarious means, you'll need to g

  • by G4from128k ( 686170 ) on Wednesday August 24, 2005 @12:54PM (#13390928)
    As with the Turing test, the entire purpose of a captcha is to distinguish humans from machines. As captcha-defeaters improve, the captchas will need to become more and more sophisticated and require more and more human or human-like intelligence to process. This arms race will culminate in a Turing test-like approach for discerning natural intelligences from artificial ones.

    The ultimate irony may occur when the first human-intelligent computer is created by a spammer for the purpose of assaulting our collective intelligences with their commerical drivel. Given the increasing value of online commerce and Google page ranking, there's probably more money in AI for captchas than AI for academic research.

    But before captchas get that sophisticated, the system will become self-defeating as the number of real humans defeated by captchas exceeds the number of AIs repelled by them.
  • Use captcha to encode math problems (IE, the captcha would have "sin(34) * 10" or whatever, and you have to type in the answer).

    This way, not only does it take a little longer to analyze, but you get them to do a little bit of work for you. Force the spammers to be part of your little distributed processing system.

    Of course the problems need to be simple enough for the users to figure out...
  • The main article refers to Inaccessibilyt of Visually-Oriented Anti-Robot Tests [w3.org], which deserves a read and commentary.

    Among the claims:
    - captchas are inaccessbile to the blind - true
    - a horde of human beings can decode the entire library over time - only true if the images are recycled, not if they are created on-demand or for one-time use.

    It also discusses some of the side-effects of making access to real humans harder, or harder for a class of users such as the visually impaired. For example, I've seen s
  • by themightythor ( 673485 ) on Wednesday August 24, 2005 @01:23PM (#13391201)
    In the table for "Cwazymail", I was trying to figure out what the pictures were. One's an elephant, one's an owl, and one is a man pulling apart his anus. Great!
  • by poincaraux ( 114797 ) on Wednesday August 24, 2005 @01:50PM (#13391470)
    Editors -

    Please don't link to the goatse man without at least some warning.

    Thanks.
  • Goatse Man (Score:5, Informative)

    by Inda ( 580031 ) <slash.20.inda@spamgourmet.com> on Wednesday August 24, 2005 @02:29PM (#13391810) Journal
    Thanks for linking the Goatse Man image in the article. Oh how I've missed being tricked into viewing thee.

    The link is not work safe.
  • Here's an idea (Score:3, Interesting)

    by 5n3ak3rp1mp ( 305814 ) on Wednesday August 24, 2005 @04:29PM (#13392699) Homepage
    I thought about this problem on a recent trip to the urinal and here's what I got.

    1) Get (or construct) a large database of nouns of well-known objects (car, orange, bottle, phone, pencil, brick, cup, etc. etc.)

    2) Retrieve image references from a (safesearch-enabled) Google image search for a random noun from your database. Pick randomly from the result set.

    3) Present images to the user. "These are pictures of a..."

    4) My next strategy was to figure out a combinatorial way to increase the number of possible replies so that an attacker couldn't simply create a database of knowns (such as a hash database of images)

    What do you smart fellers think? other than google being pissed for scraping their site
  • Totally fake (Score:5, Insightful)

    by VAXGeek ( 3443 ) on Wednesday August 24, 2005 @09:07PM (#13394458) Homepage
    This article is a fraud. No source is presented, and goatse.cx is displayed in the examples. This whole thing was contrived just to get goatse.cx in a legitimate front page post. Best troll in years.

Some people claim that the UNIX learning curve is steep, but at least you only have to climb it once.

Working...