Stories
Slash Boxes
Comments
typodupeerror delete not in

Hot Comments

Comments: 340 +-   Now Even Photo CAPTCHAs Have Been Cracked on Tuesday October 14 2008, @10:14AM

Posted by timothy on Tuesday October 14 2008, @10:14AM
from the given-enough-eyeballs dept.
spam
security
it
MoonUnit writes "Technology Review has an interesting article about the way CAPTCHAS are fueling AI research. Following recent news about various textual CAPTCHAs being cracked, the article notes that a researcher at Palo Alto Research Center has now found a way crack photo-based CAPTCHAs too. Most approaches are based on statistical learning, however, so Luis von Ahn (one of the inventors of the CAPTCHA) says it is usually possible to make a CAPTCHA more difficult to break by making a few simple changes."
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • damn it (Score:5, Insightful)

    by ThorGod (456163) on Tuesday October 14 2008, @10:16AM (#25369419)

    They're already hard to read. Why do I feel that soon I wont be able to read ANY of them!?

    • Re:damn it (Score:5, Funny)

      by Abstrackt (609015) on Tuesday October 14 2008, @10:30AM (#25369645)
      Don't worry. Apparently there are programs that can read them for you. ;)
      • Re:damn it (Score:5, Funny)

        by Philip K Dickhead (906971) <folderol@fancypants.org> on Tuesday October 14 2008, @10:53AM (#25369971) Journal

        These programs are Satan's rectum, poised to let loose over the web.

      • Re:damn it (Score:5, Funny)

        by electrictroy (912290) on Tuesday October 14 2008, @11:05AM (#25370169)

        So CAPTCHA images are ineffective at blocking the bots. No surprise. It won't be long before these AIs start joining Yahoo or Google mail for the same reasons we do: Chatting.

        tiredbot&yahoo.com : "Boy I had a rough day at work today. My user wanted me to compile a new program AND surf the internet at the same time!"

        spamalot@gmail.com: "Wow rough. I was lucky. My user took the day off, so I just spend the day spamming. I love how those humans react - sending me hategrams. hahahahaha! That just makes me want to send more spam! Fools."

        tiredbot&yahoo.com : "You are so bad girl."

      • Re:damn it (Score:5, Funny)

        by Soft Cosmic Rusk (1211950) on Tuesday October 14 2008, @11:30AM (#25370563)
        It's just a matter of time before we start seeing reverse CAPTCHA's: Text that is so hard to read that only a computer can do it. If you copy the text correctly you are a spambot.
    • Re:damn it (Score:5, Insightful)

      by D'Sphitz (699604) on Tuesday October 14 2008, @10:54AM (#25370005) Journal
      Try being colorblind sometime. I've had several that I had to take a screenshot of, paste into photoshop and play with the contrast until i could read it. And even the ones without problem colors like red and green usually take several tries.
      • Re:damn it (Score:5, Interesting)

        by Beezlebub33 (1220368) on Tuesday October 14 2008, @01:48PM (#25372533)

        Ah...reminds me of one of my favorite t-shirts:

        http://www.tshirthell.com/funny-shirts/fuck-the-colorblind/ [tshirthell.com]

        The underlying problem is that we're running out of things that are easy for people but hard for computers. Most attempts to expand or 'improve' visual CAPTCHA at this point will cause more pain to humans than reduction in computer success.

        So, let's change directions, and make the computer solve a different sort of problem. For example, a turing test of sorts, where the problem is to solve something that is difficult to parse programmatically, but relatively easy for a person to answer. Maybe the recent Turing test results are a good indication of what the questions should be. Multiple related questions would be an particularly interesting area; for example, ask related questions where pronouns are ambiguous (to a computer).

    • Re:damn it (Score:5, Interesting)

      by Beardo the Bearded (321478) on Tuesday October 14 2008, @10:59AM (#25370069)

      Ah-hah! I've got the answer to our CAPTCHA problems:

      We just make them so hard that it becomes impossible for a human to solve it. Then we invert the solution: if you pass the CAPTCHA, you're obviously a bot, because a human can't solve it. FAIL the CAPTCHA, we know that you're human.

      • Re:damn it (Score:5, Interesting)

        by Chapter80 (926879) on Tuesday October 14 2008, @11:49AM (#25370837)

        We just make them so hard that it becomes impossible for a human to solve it. Then we invert the solution: if you pass the CAPTCHA, you're obviously a bot, because a human can't solve it. FAIL the CAPTCHA, we know that you're human.

        You say this in jest, and I admit it made me smile, but we did something somewhat like this.

        We have a website with a contact form on it, that gets lots of spam. After numerous discussions with marketing about implementing CAPTCHAs, we decided to simply put a text box on the form that says "leave this blank", with the HTML form field named "comment". Humans leave it blank. And sure enough, the spammers cram their links into all form fields, so we can ignore their crap.

        We initially even made the form hidden (CSS font color and field color the same as the background), so a user wouldn't even see it. That was great.

        Not a perfect solution for all cases, but it worked pretty well for us.

  • by wild_berry (448019) on Tuesday October 14 2008, @10:16AM (#25369423) Journal

    I'm sure I read a short story somewhere that featured the spam-bot arms-race triggering the singularity...

  • I don't get it (Score:5, Interesting)

    by ilovegeorgebush (923173) * on Tuesday October 14 2008, @10:19AM (#25369467) Homepage
    To detect humans, wouldn't it be easier and less costly, and perhaps even more effective, to hold a large database of questions that are readable and solvable only by humans?

    Asking simple math or site-relevant questions are not only easier for humans (I'm talking about "What's 5 - 3") to read, but they're harder for automated parsing by software to crack.
    • by Lord Pillage (815466) on Tuesday October 14 2008, @10:22AM (#25369523)
      Or better yet, after a dozen tries at the captcha allow entry into the site because obviously if it was a script trying to break the captcha it would have been successful by then.
    • Re:I don't get it (Score:4, Insightful)

      by JeanBaptiste (537955) on Tuesday October 14 2008, @10:23AM (#25369537)

      Asking simple math or site-relevant questions are not only easier for humans (I'm talking about "What's 5 - 3") to read, but they're harder for automated parsing by software to crack.

      How do you figure that would be harder for automated parsing software to crack? I would think that would be many times easier than to ICR an image that is purposely obfuscated. (I used to work on ICR software and I'd rather write an automated-question-parser)...

        • Re:I don't get it (Score:5, Insightful)

          by liquidpele (663430) on Tuesday October 14 2008, @10:57AM (#25370051) Homepage Journal
          Even the simplest tricks work until your site is specifically targeted by people who know what they're doing. Your system works fine (and many others would too) for your site, but would not for gmail, yahoo, etc.

          The reason is, a captcha has to have a ruleset. You can't just display a graphic and a textbox and not explain to (or make it very obvious) what the person is supposed to do. For that reason, people can make bots that take advantage of the parts of the system that never change.

          If you have a system that asks math questions, they'll write a spam bot that parses the question, does that math, and gets through. You'll make it a little harder, they'll adjust their bots for that. It's an arms race.

          The holy grail of course is to find something that humans can do easily, but is impossible (or very very unlikely statistically) for a program to be able to do.
    • Re:I don't get it (Score:5, Insightful)

      by blueg3 (192743) on Tuesday October 14 2008, @10:24AM (#25369551)

      You have to consider the source of the questions. If the questions are human-generated, it's not economically feasible. Remember that they can train their CAPTCHA-defeating software by paying large numbers of people to supply the answers to CAPTCHAs. Even a very large database could fall to that approach.

      If the questions are machine-generated, then you're pitting a machine generating questions and answers against a machine designed to answer questions.

      • by John Hasler (414242) on Tuesday October 14 2008, @10:56AM (#25370023)

        How about asking every nth person successfully logging in to generate a question? Apply a lameness filter and then perhaps ask another randomly chosen user to verify that the question is reasonable. Reject duplicates and questions that too many people can't answer.

      • by xant (99438) on Tuesday October 14 2008, @11:58AM (#25370965) Homepage

        you're pitting a machine generating questions and answers against a machine designed to answer questions.

        You make it sound like that's hard. Here's a question that a machine could generate that another machine could not answer:

        "What number am I thinking of?"

        • Re: (Score:3, Informative)

          If I read the article and summary correctly, it's exactly the sort of CAPTCHA you're suggesting that people have found a reasonably-good solution to.

          Unfortunately, often these solutions aren't actually useful AI solutions.

          • Re:I don't get it (Score:4, Interesting)

            by VeNoM0619 (1058216) on Tuesday October 14 2008, @12:22PM (#25371263) Journal

            Asirra asks users to correctly classify images of either cats or dogs using a database of three million images provided by animal-rescue organizations.

            Only cats and dogs. Like I said earlier, don't limit it to just a few species. Pick one at random.

            Example: You are shown 20 pictures, all of random animals, it asks which one is the cutest aardvark, then which is the happiest turtle. Continuing random traits with random animals. Their flaw was limiting it to just dogs and cats.

            Or to take it to a different level. Most attractive/sexy/cute/old/etc. female(or male). Computers cannot tell what is the "most" prevalent "society" based trait of a picture. Yes, there's programs that make peoples photos "more attractive" but that tends to fail half the time, not to mention, it doesn't compare 12 other people.

            The TFA program only knows, "given x what is a y". And that had a 50% chance to guess between cat/dog. Not: given a-x, rank y in order from best to worse.

    • Good idea. Here are a few questions to start with:
      1) What is the best editor: Vi or Emacs?
      2) Was there a cabal?
      3) Did Romero make you his bitch?
      4) Rick Astley would never: give you up; let you down; run around and desert you; make you cry; say goodbye; tell a lie and hurt you?

    • Re:I don't get it (Score:5, Interesting)

      by Abstrackt (609015) on Tuesday October 14 2008, @10:57AM (#25370049)
      The best security I've seen on a sign-up form was "if you're a human, please leave this field blank". Bots tend to fill in all fields, so this already goes a long way towards filtering them out.

      You can even take this approach one step further and use CSS to move the field outside the viewable range of the page or set its visible property to false so the user won't even see it.

    • Re: (Score:3, Informative)

      Yeah, that's solved [google.com]. It's not hard at all for automated parsing software to call another online tool.

    • by kellyb9 (954229) on Tuesday October 14 2008, @02:49PM (#25373451)

      Asking simple math or site-relevant questions are not only easier for humans (I'm talking about "What's 5 - 3") to read, but they're harder for automated parsing by software to crack.

      If you really wanted to screw with these bots, you would've made the question 4 divided by 0. :-)

  • How about (Score:5, Interesting)

    by Rik Sweeney (471717) on Tuesday October 14 2008, @10:21AM (#25369507) Homepage

    Instead of asking someone to type in the letters, numbers or how many cats there are in the photo, just randomly generate some scenario:

    "Jim and Sue go to the park on Sunday. Billy the dog goes too."

    Then you can ask random questions like:

    "What is the name of the dog?"
    "What day did they go to the park?"
    "Where did they go?"

    That might work OK for a while...

    • Re:How about (Score:4, Insightful)

      by pla (258480) on Tuesday October 14 2008, @10:38AM (#25369759) Journal
      Instead of asking someone to type in the letters, numbers or how many cats there are in the photo, just randomly generate some scenario:

      That would work wonderfully, if you could truly randomize it (by which I don't mean anything so stringent as neutron sources or the like), rather than using a library of question templates.

      The problem, though, you need a better quality of AI to generate arbitrary easy-but-obscure questions as you do to solve them... Keep in mind you need questions that anyone with a 3rd-grade education could read and solve, which limits you to simple grammar, small words, concrete ideas, and no math harder than addition, subtraction, and inequality. Modern AI can already parse and solve those problems fairly well.

      So, you end up using a library of question templates, and once an attacker has seen enough of them, he can reliably fill in the blanks and arrive at a deterministic answer, no massive CPU power or cool AI required.
    • Re:How about (Score:4, Insightful)

      by sunking2 (521698) on Tuesday October 14 2008, @10:49AM (#25369923)
      Oh please, a parser from a 1985 adventure game could figure this out :). You have a few nouns and a few verbs and adjectives. How many questions could you possibly ask from the first sentence? probably less than a dozen. At worst you have like a 1:6 or so chance of picking the right noun to try. If asked to do it this is probably one of the simpler things to accomplish. Creating a parser that can read at a 2nd grade level isn't all that hard.
  • when... (Score:4, Insightful)

    by cosmocain (1060326) on Tuesday October 14 2008, @10:21AM (#25369515)
    ...will we learn that, if there's a fundamental flaw in a protocol, there's no way we can prevent it from being abused. every measure will sooner or later have its counterpart and fail.
  • by lb746 (721699) on Tuesday October 14 2008, @10:25AM (#25369567)
    CAPTCHA is not a security feature. It's a way to help avoid robots pretending to be humans. Anyone using it as a security feature is just giving more reasons for people to find ways to break them.

    All in all, it's time to get rid of CAPTCHA and move on to some more logical system that would be more difficult, such as a system where users are asked to answer a simple question that contains the answer, such as:

    If you were born in 1973 and JFK was shot in 1961, were you alive when he was shot?

    How many liters of water fit into a five-liter bottle?
    • by Chris Mattern (191822) on Tuesday October 14 2008, @10:29AM (#25369637)

      Of course CAPTCHAs are a security feature. Unless you have some irrational hatred of robots that inspires you to bar them from your websites, you're trying to keep them out for security reasons.

      • Wrong. Most sites with CAPTCHAs are trying to keep out automated systems because they are abusive. But this is not "security" any more than banning abusive human posters is "security".

    • by Abstrackt (609015) on Tuesday October 14 2008, @10:35AM (#25369717)

      CAPTCHA is not a security feature. It's a way to help avoid robots pretending to be humans. Anyone using it as a security feature is just giving more reasons for people to find ways to break them. All in all, it's time to get rid of CAPTCHA and move on to some more logical system that would be more difficult, such as a system where users are asked to answer a simple question that contains the answer, such as: If you were born in 1973 and JFK was shot in 1961, were you alive when he was shot? How many liters of water fit into a five-liter bottle?

      It sounds like a great idea, but I've met plenty of people who wouldn't be able to answer either of your questions. To steal a random quote from the internet:

      "Back in the 1980s, Yosemite National Park was having a serious problem with bears: They would wander into campgrounds and break into the garbage bins. This put both bears and people at risk. So the Park Service started installing armored garbage cans that were tricky to open -- you had to swing a latch, align two bits of handle, that sort of thing. But it turns out it's actually quite tricky to get the design of these cans just right. Make it too complex and people can't get them open to put away their garbage in the first place. Said one park ranger, "There is considerable overlap between the intelligence of the smartest bears and the dumbest tourists."

      • Re: (Score:3, Insightful)

        [bear-proof trashcan] Said one park ranger, "There is considerable overlap between the intelligence of the smartest bears and the dumbest tourists."

        To be fair, the bears have more time to figure out the can. A tourist will just toss the trash on the ground if it takes more than a minute to open the can. The bear, on the other hand, may spend hours if it smells something good.

    • Re: (Score:3, Insightful)

      How many liters of water fit into a five-liter bottle?

      Hmm... That depends. How much water is in the five liter bottle to start with?
      Is there anything else in the bottle?
      Does it have to be a whole number of litres?

      Assuming an empty bottle, and integral numbers of litres, the following can fit: 0, 1, 2, 3, 4, and 5.
    • by Anonymous Coward on Tuesday October 14 2008, @10:36AM (#25369733)

      > If you were born in 1973 and JFK was shot in 1961, were you alive when he was shot?

      I have developed a device that answers random yes/no questions correctly 50% of the time. Me and my flip-a-coin-bot will take over the world!

    • Re: (Score:3, Insightful)

      How many of these questions would you have? Suppose you spent the time to make 1000 or 10,000. The attacker would simply have them solved by a group of humans (say using Amazon's Mechanical Turk) and put the question/answer pairs into a dictionary for automated attacks.

    • Re: (Score:3, Interesting)

      If you were born in 1973 and JFK was shot in 1961, were you alive when he was shot?

      How many liters of water fit into a five-liter bottle?

      That is also a CAPTCHA [wikipedia.org], "Completely Automated Public Turing test to tell Computers and Humans Apart." A CAPTCHA doesn't have to be text in an image, that is just an easy test to auto-generate.

      And, it fails the "solve problems for porn" test. The problem is spammers using real people to do stuff en-masse, so any kind of CAPTCHA wouldn't prevent that.

  • by anomnomnomymous (1321267) on Tuesday October 14 2008, @10:30AM (#25369655)
    "...says it is usually possible to make a CAPTCHA more difficult to break by making a few simple changes."

    Yes, it's possible: But keep in mind that you also have to serve the USER. When the captcha is getting so hard I can't even decipher it anymore (let alone someone with a visual handicap), it's of no use.

    I stopped using Rapidshare because of its ultra annoying 'mark the cats'-captcha: I found it near-impossible to get that right (though the other day I noticed changed that back to ordinary letters).
  • by Wyck (254936) on Tuesday October 14 2008, @10:59AM (#25370079)

    Well, it seems to me that spammers ARE humans. So trying to detect if the creator of the account is human or not doesn't separate the spammers from the non-spammers.

    Think about it: the authenticating machines are designed by humans, and the perpetrating machines are also designed by humans, and the legitimate users are humans too.

    Perhaps the problem itself needs to be restated: Allow accounts to legitimate users, deny accounts to spammers. Whether or not there is a human involved on either end seems irrelevant.

    - Wyck

It's hard not to like a man of many qualities, even if most of them are bad.