Forgot your password?
typodupeerror
Security

Analyzing CAPTCHAs 105

Posted by CmdrTaco
from the i-fail-more-than-i-win dept.
Bruce Schneier's blog pointed me to a research paper on "Attacks and Design of Image Recognition CAPTCHAs" (PDF). The abstract says, "We systematically study the design of image recognition CAPTCHAs (IRCs) in this paper. We first review and examine all IRCs schemes known to us and evaluate each scheme against the practical requirements in CAPTCHA applications, particularly in large-scale real-life applications such as Gmail and Hotmail."
This discussion has been archived. No new comments can be posted.

Analyzing CAPTCHAs

Comments Filter:
  • 2nd link is a PDF. Thanks for the warning...

    • It's 2010, get a life. Comments like this were funny sometimes around 1996.

      • by Culture20 (968837)

        It's 2010, get a life. Comments like this were funny sometimes around 1996.

        It's 2010. In 1996, PDFs weren't a potential security vulnerability.

  • hmm... (Score:2, Insightful)

    by radicalpi (1407259)
    I wonder how long until we have no way of distinguishing a bot from a person. existing CAPTCHAs don't work all that well, and I can't see future ones working much better for very long. The Cylons are among us! Any one of us could be one!
    • by fifedrum (611338)

      I hear you can just pay people to sit in front of a PC all day solving captchas, and it's cheaper than a bot.

      • Re: (Score:2, Funny)

        by radicalpi (1407259)
        Yeah, they're Cylons.
        • by GarryFre (886347)
          Are Cylons humor impaired? I think I met a few here and there.
        • I dealt with spam sent via phished passwords in a previous job. No one could relay through our site, and our IDS blocked large mail bombs via authenticated SMTP and IMAP, so the spammers always got in by logging in via the HTTP interface and apparently cutting and pasting spam messages one recipient at a time.

          About 3/4 of the spammy logins were from Nigeria and Togo and the rest were from various places like Israel, Saudi Arabia, and various UAE states. It's the ultimate work from home job!

          • by fifedrum (611338)

            this is a problem with which I'm familiar. Used to be isolated to the 41. specifically Nigeria, then Ghana, Ivory Coast, Burkina Faso got into the act, then UAE, Egypt and Algeria. Lately the headache has been migrating to Malaysia and Jakarta. Throw in the random UK, Ireland, Spain, Portugal, Russian IPs, occasionally some from China.

            I just track the IPs, when they reach a magical threshold I loop through with iptables and block the whole damned network, and when enough subnets are blocked, move up to the

      • Mechanical Turk FTW. Apparently we don't really need strong AI so long as we have cheap labor in the 3rd world.
      • Re: (Score:2, Informative)

        by SirKveldulv (1073650)
        Yes you can. Costs From $2-7/1000.
      • Cheaper? Maybe for the initial cost of developing such a bot for a temporary amount of time, but the bot doesn't cost anything after that as far as I know.

    • Re:hmm... (Score:5, Funny)

      by melikamp (631205) on Tuesday October 05, 2010 @10:09AM (#33793900) Homepage Journal

      It's happening already, I think, with turn-key solutions floating around featuring 20-35% accuracy. I don't have 100%, more like 80% or so, and I am a human.

      OT, but I found a way to make RECAPTCHA entertaining. With two words given, I always just type one of the words, and put "fuck" for the other. The accuracy falls below 50%, but the giggles make it all worthwhile.

      • OT, but I found a way to make RECAPTCHA entertaining. With two words given, I always just type one of the words, and put "fuck" for the other. The accuracy falls below 50%, but the giggles make it all worthwhile.

        Below 50%? I probably average ~90% ... the key is in figuring out which word you have to get correct. There’s always the button to get a different captcha if you can’t tell on the one it gave you...

        • by melikamp (631205)
          You are right, most of the time they look sufficiently different: the challenge is longish and more scrambled, while the optional is shorter and looks like a shitty scan. Sometimes, though, they do look pretty damn identical. Guys, let's all write "fuck" in RECAPTCHA, that way we may actually make a difference.
      • Re: (Score:1, Insightful)

        by Anonymous Coward

        ...and I am a human.

        Can you prove that?

        • by lxs (131946)

          I have come to the conclusion that I am a bot. Half the time I can't read those captcha thingies.

          • by Bigjeff5 (1143585)

            Same here, I spent 15 minutes trying to get one to work the other day, but the letters were so messed up and the words so nonsensical that I couldn't manage it. So I tried the audio option. Makes sense right? Just listen to the words and it'll be easy! Except the audio was so fucked I couldn't understand it.

            I managed to get in eventually, but I'm avoiding that website from now on.

    • by gstoddart (321705)

      I wonder how long until we have no way of distinguishing a bot from a person.

      Well, there's always the Turing Test [wikipedia.org], but that could make signing into web sites a real nuisance. :-P

    • Re:hmm... (Score:5, Interesting)

      by tlhIngan (30335) <slashdot AT worf DOT net> on Tuesday October 05, 2010 @11:36AM (#33794840)

      I wonder how long until we have no way of distinguishing a bot from a person. existing CAPTCHAs don't work all that well, and I can't see future ones working much better for very long. The Cylons are among us! Any one of us could be one!

      Well, CAPTCHAs worked because they relied on vision tests - a skill that humans still do better than computers, but computer vision is already quite advanced. Then the countermeasures came where CAPTCHAs started getting so distorted that it was impossible to determine the code (I remember a forum I signed up for - too more than 15 tries and a cookie reset).

      However, there are still difficult-for-computer-but-easy-on-humans tasks that can be done. I'm surprised no one's yet hooked a way into the Amazon Mechanical Turk or the like. Perhaps a simple one can be where you show a panoramic view along a busy street. Then you ask the question "What is the name of the store at number 763?" Or "What is the street number of ZZZ Supermarkets along this street?". "There is a large group of friends gathered near XXX store. How many people are in the group?"

      Or simpler ones - if your forum or other thing is about a specific topic, ask a question about that topic. Or even self-referential ones. "What of the following will an art thief steal? A) Mona Lisa, B) Big screen HDTV, C) Cellphone, D) Money".

      Might as well advance the state of things like image recognition and natural language queries while we're at it.

      • God I wish I had mod points. Beautiful solution. Thank you.
      • by coolvenk (1128477)

        .

        However, there are still difficult-for-computer-but-easy-on-humans tasks that can be done. I'm surprised no one's yet hooked a way into the Amazon Mechanical Turk or the like. Perhaps a simple one can be where you show a panoramic view along a busy street. Then you ask the question "What is the name of the store at number 763?" Or "What is the street number of ZZZ Supermarkets along this street?". "There is a large group of friends gathered near XXX store. How many people are in the group?"

        Or simpler ones - if your forum or other thing is about a specific topic, ask a question about that topic. Or even self-referential ones. "What of the following will an art thief steal? A) Mona Lisa, B) Big screen HDTV, C) Cellphone, D) Money".

        Might as well advance the state of things like image recognition and natural language queries while we're at it.

        Coz with the alternatives you propose a human has to first figure out the correct answer to compare against the user's response in a CAPTcha challenge. If they had an algorithm to figure it out, the attacker would use it too. And, millions of CAPTCHAs are served everyday, so they have to be automated.

      • I recall how Planetarion [online game] used simple trivia questions in their CAPTCHAs. The arithmetic category was no problem, but a few of the simple trivia questions tripped me up, especially because they were Euro-centric (the game *is* based in the UK). I shouldn't have to Google for a CAPTCHA answer.

    • by residieu (577863)
      Soon you'll be able to distinguish a bot from a person because only a bot will be able to read the CAPTCHA.
    • At the point that it becomes impossible to distinguish them, you will no longer need to. Why discriminate against a bot, if it's able to participate in discussions (to an on-topic degree as well as humans), has its mind influenced by ads, etc?
  • My experience with captcha is they are too focused on being the perfect system, to the point where it goes from a simple annoyance to almost impossible to access whatever it's protecting.

    • by Cro Magnon (467622) on Tuesday October 05, 2010 @10:18AM (#33794002) Homepage Journal

      At some point, CAPTCHAs will reach the point where ONLY a bot can get past them.

      • by clone53421 (1310749) on Tuesday October 05, 2010 @10:21AM (#33794032) Journal

        Then they’re designed wrong.

        You should at least skim over the paper, that’s actually a significant portion of what it’s focused on... finding something that humans are good at and bots are not. As better bots have been written, that may have changed significantly... most present CAPTCHA systems are relatively broken.

        • by Bigjeff5 (1143585)

          The GP's point was that there are captchas out there that are very difficult for even human readers to understand. However, pattern recognition software is getting better all the time, while human pattern recognition is generally fixed (It's phenomenal, but not improving). Eventually pattern recognition software will overtake the human pattern recognition ability, and then the only ones who will be able to past a captcha is a bot.

          • by nomel (244635)

            Well, then you move on to a harder pattern, such as "what mood was the writer in when they wrote this" or "does the puppy in this picture look sad?" or, "is the person pictured in a dangerous situation".

            If we're at that point...then I would assume we would also have the ability to detect spam in a contextual sense!

    • by binkzz (779594)

      My experience with captcha is they are too focused on being the perfect system, to the point where it goes from a simple annoyance to almost impossible to access whatever it's protecting.

      Then it's getting further away from being perfect. A perfect captcha would be unnoticed.

  • by Anonymous Coward

    I have a friend that used to bot WoW for a couple years until Blizzard got the law on their side^H^H^H^H^H^H^H^H^H^H^H^H^H in their pocket. Turns out he used to redirect bot checking CAPTCHAs to an IRC channel where the paid minions would solve them.

    CAPTCHA has been a moot point to me since I witnessed this process occur in real time.

    • by buck-yar (164658) on Tuesday October 05, 2010 @10:10AM (#33793914)

      I heard porn sites were require a captcha to view an image, but it was really a redirect from another captcha. So porn surfers were solving captchas for bots.

      • I've seen that too and I've always wondered if that isn't the real reason we are getting near impossible captchas these days. Some admin probably sees lots of bots getting past the captcha filter and instead of realising it's humans doing the work decides to make the captcha more and more difficult.

        Some of the captchas go so far beyond a turing test this seems like the most plausible explanation. The current captchas can surely be toned down a bit in difficulty and still be impossible for state of the art a

  • do captcha in a different way. Show an image of someone famous, like Obama, then ask who that person is. The answer key could have "Obama," "Barrack," "Barrack Obama" and every other iteration.

    • Re: (Score:3, Insightful)

      by clone53421 (1310749)

      There are only so many such images available for use, and the image library could fairly easily be exhausted and all of the images correctly identified at which point a bot could be used with near-100% accuracy.

      • by Rik Sweeney (471717) on Tuesday October 05, 2010 @10:25AM (#33794076) Homepage

        There are only so many such images available for use

        Not if they use images of Lady Gaga

        • by KarrdeSW (996917)

          There are only so many such images available for use

          Not if they use images of Lady Gaga

          Except the idea only works if the answer isn't always Lady Gaga

          • You're right, they can't all be pictures of the same person, but it seems like multiple pictures of the same person, mixed in with pictures of other people, could help or at least not hurt.

            If the pictures of the same person look very different (Gaga's fashion choices would certainly be an example of that), that would help such a process

        • she is the near complete opposite of a cartoon character in that respect (say, Bart's red shirt and blue shorts) - almost every day's outfit is *different*.

          [I'm assuming the joke was about her divergent fashion selections)

    • Why not... show an image of someone famous, then ask who that person is.

      Collecting the pictures for this would be pretty expensive. You've got to figure out licensing, tagging (including acceptable synonyms in several target languages), down-sampling, storage, accessibility, etc. The attacker only has to figure some (imperfect) tagging, and they can use well-researched ideas (facial recognition) to help with this. Moreover, the larger and more valuable target you are, the more images you must find. Wo

    • by natehoy (1608657)

      It might work, except that someone who is famous to one person is unknown to another. Were you to put up a picture of Barack Obama or Joe Biden, I could identify either one easily. The same could not be said of all world leaders, however. I read pretty regularly about events involving David Cameron, Christian Wulff, and Nikolas Sarkozy, but I'm not sure if I could accurately identify a photo of any of them given no other context.

      Lady Gaga? Show me a picture of her without any context, and I'd have to st

      • That is a strong point about why using a famous person should not be used but what about something simpler. I propose something like this:
        5 images of random people are selected from a data base where the images are tagged about the person's appearance (i.e. hair color, sex, facial hair, eye color, etc).
        A random question is asked about those five images (i.e.- how many have facial hair? How many have blue eyes? How many are women?)
        If answer matches with the tags from the 5 random images you have a succes

        • by natehoy (1608657)

          Better, but still problematic for another reason.

          Captcha requires lots of possibly incorrect responses. An answer with a minimum value of 0 and a maximum value of 4 (for example) means there are 5 possible responses. 0,1,2,3,4.

          That gives a bot a 20% chance of being correct, which is unacceptably easy.

          You've also made the captcha solution language-specific. And if you use colors, color-blindness may be an issue for you now as well.

          Don't get me wrong, I can see some applications of picture-based captcha, b

          • Good points so lets address them. Your calculation is a bit flawed for a simple question you have 6 possible answers - 0,1,2,3,4,5. So the bot has a 1/6 chance of correctly guessing, which is still unacceptably easy. So add a second or third question raising the possibilities up to 1/36 and 1/216 respectively. Or add more images to raise the base number up from 6 to 11 or maybe 21. Suddenly you get from 1/6 odds up to 1/9261 (20 images 3 questions). The color issue would be problematic and the only wa

        • by rjstanford (69735)

          The trouble is that you've made it hard enough (by definition) that a human is needed to lovingly hand-craft each one as well. After all, if the computer could put them together from an image database, it could solve them the same way.

          tl;ds

          Too long; doesn't scale.

    • Re: (Score:3, Insightful)

      Reverse image searches like TinEye [tineye.com] blow this idea out of the water before it's even begun.

  • Interesting study however needed a more diverse range of sample testers all of which were early twenties volunteer university graduates. I only bring this up because I see a very different responses to CAPTCHAS. The response and attitude towards CAPTCHAS from young university people hanging around the IT labs where this was most likely advertised will be far far different to the average online citizen. . Im not sure how accurate this is but out in the non IT section of society CAPTCHAS are loathed and hated
  • http://lib.mipt.ru/?spage=reg_user [lib.mipt.ru] From the Moscow institute of physics and technology. Described as a "little school-level problem" :-) Be prepared to dust off your knowledge of Kirchoff's law (http://en.wikipedia.org/wiki/Kirchhoff%27s_circuit_laws) and ohms law, and to solve a system of equations that boils down to a 6x6 matrix.
  • Seriously, what use of are captchas anymore when they pay actual humans to do the dirty work? I got like hundreds of fake users with IPs from India and China in my forums, that sign up just for putting a CEO tailored message and URL in their signature.
  • I dread Craptcha (Score:3, Informative)

    by GarryFre (886347) on Tuesday October 05, 2010 @11:40AM (#33794910) Homepage
    Have you ever ran into Captcha that claims your response is wrong when its obvious that is is NOT wrong and tried the audio stuff? The audio version is so retarded its disgusting. It usually features two guys with grossly distorted voices uttering what sounds like 14 words of gibberish in some short conversation at the breakneck speed of an auctioneer or bugs bunny on Helium. Not a single word can be understood, and then it asks for the two words in the sentences. The worst I had ever seen of this kind of foolishness was Dev Shack. It sounds like a great site for programming resources but I can never join because I can't get past their defective Captcha. I can't even tell them its broken because the Captcha prevents any such messages from getting through. This is what I call "Craptcha" and this is no Fraudian slop. I used to run into a few like this, but not lately, but when I do, I still get that sick sinking feeling.
    • by GarryFre (886347)
      I even took screen shots. On second thought I could do a whois on the url and email their listed email address.
  • Once the captcha is defeated, a human being sends a simple question to the account to validate it.

    "Was Jennifer Aniston in "Friends""
    "Is Kentucky a country?"
    "Is the Euro a kind of duck?"

    • by andrewd18 (989408)
      Except those can be solved brute-force with a simple "yes" or "no"... you're guaranteed to be right half the time. You'd need questions with more ambiguous or context-sensitive answers like:

      If train A leaves Chicago traveling 100MPH and train B leaves New York traveling 150MPH and the distance between the two cites is 600 miles how far from New York will it be when the two trains meet?

      And you thought word problems would never be useful!

      • Who was one of the female stars of friends?

        What was the Dow yesterday?

        Please respond and say that you are a banana.

        I started this on a local personals site about 7 months ago and I'm seeing it everywhere now. I think it was invented in multiple places. It makes personals spam almost useless regardless of how real it seems.

        • by neminem (561346)
          I'm a moderator on a decently active forum. At the time I got the gig, there were dozens of spambot-created threads a *day*. We talked about adding a captcha to the signup, but we couldn't really find any that weren't easy to crack, without also being painful for *humans*. Then someone suggested we could just ask a trivial question about the associated game (for instance, "What do accordion thieves steal?" (answer: "accordions")), and if spambots started getting through, we could just change the question. I
      • The New York train has 3/5th of the total speed, so they'll be 3/5ths of the way, i.e. 360 miles.
        Never knew those problems were that easy ... that'd still be beyond most people, though, I'm afraid.

  • http://xkcd.com/233/ [xkcd.com] Seriously, when I heard of the algorithm that could solve captchas 30% of the time, I was like: "Download link?"

A freelance is one who gets paid by the word -- per piece or perhaps. -- Robert Benchley

Working...