Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Evolution of the 'Captcha'

Posted by CmdrTaco on Mon Jun 11, 2007 07:36 AM
from the why-can't-i-even-read-them-half-the-time dept.
FireballX301 writes "The New York Times is running an article about the small word puzzles various sites use in order to defeat automated script registration while still letting humans through. It seems many people can't actually solve them anymore, so new alternatives (image recognition) are being created. This, of course, seems breakable as well — is there a feasible alternative to the captcha, or are we stuck jumping through more and more hoops to register at places?"
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • I am torn (Score:5, Funny)

    by jollyreaper (513215) on Monday June 11 2007, @07:40AM (#19463983)
    As a Christian fundamentalist, I cannot in good conscience believe that catchpas have evolved, yet at the same time since I can never figure out what to type to make them work, I cannot believe any intelligence was involved in their design.
    • by dattaway (3088) on Monday June 11 2007, @07:46AM (#19464025) Homepage
      Here in Kansas, captcha evolution has been subject to legal review. Kansas City's Road Runner is employing packet shaping to eliminate the evolution of captchas. You might not see the captcha, but others believe it exists.
      • Re: (Score:3, Funny)

        Now I know who writes the Captchas - it can only be the writing of his noodly appendage (Church of the Flying Spaghetti Monster).
    • Re: (Score:3, Interesting)

      I thought I could avoid using Captcha's by simply request the user type in their IP address that I showed in at the bottom of the screen. I know that bot can easily get the IP address too...I was thinking that my request was vague enough that the bot wouldn't understand the question. My guess is that the bot didn't understand the question and reported the error to its writer. The writer must have explored my website, found the source of the error and then added a subroutine to deal with my question.

      Th
      • by jollyreaper (513215) on Monday June 11 2007, @10:39AM (#19466071)

        I think really we should be switching to riddles instead of captchas. "What walks on four legs in the morning, two in the afternoon and four in the evening?"

        That will sort the men from the bots. ;)
        That would be three legs in the evening and you would be describing my father. He's hungover in the morning, just about has his shit together in the afternoon but is already into the next bottle by evening.
  • Knowledge tests... (Score:3, Interesting)

    by Anonymous Coward on Monday June 11 2007, @07:41AM (#19463991)
    The other day I saw a system that posed the question:
    'Germany is a country in Africa?'

    Your duty to prove you were human was to change it to the proper continent and the question mark to a period. Seems pretty fool proof, especially if you combine it with things like "and make 'country' all capitals."
    • by CrazyTalk (662055) on Monday June 11 2007, @07:52AM (#19464075)
      Ummm I dont think this would work in the US, where (considering our educational system) some people might answer "yes". In fact, some celebrity (I forget which) recently thought that Japan was a country in Africa, which is why Africa has the best sushi.
    • by bobmarleypeople (1077639) on Monday June 11 2007, @08:05AM (#19464195) Homepage Journal
      I've seen several sites using questions similar to yours except they were more obvious. An example was:

      Which is a food?
      A) pink
      B) car
      C) Britney Spears
      D) Hamburger

      There is of course the possible registration by a disturbed and horny male who would say "Britney Spears" but you get the idea.
    • I've used something similar -- requiring a question that can only be answered by people with a genuine interest in the forum/site they are registering for. I have gone from 7-12 spam registrations a day, down to zero [spam regs] since doing so, while people who are legitimately registering still get through.
      • Re: (Score:3, Informative)

        You know, as a security sort of person, I tend to agree in principle. I do, however, find it fascinating how principle and reality don't quite line up all that often. A case in point, one of the blogs I read fairly regularly uses captchas. He doesn't really obscure it too much, and it's always the same 3 character string, related to the name of the site. Any spammer who actually posted more than once could easily figure it out. So far, none have. He wrote about his experiences with this here. [shamusyoung.com] So mayb
        • Re: (Score:3, Informative)

          You can get away with that if you're a little site. But if you're Google, or Slashdot, or Facebook, then it'll last about two days.
  • In my mind, anything that can be put out by an automated system for purposes of determine whether the communications on the other end is from an automated system can, with enough ingenuity, be answered by an automated system. IOW, all 'captchas' and similar methods are ultimately defeatable. It's an arms race, just like DRM: clever people will always figure out how to defeat what protections you put in place no matter how clever your protections are.
    • Re:Alternative? (Score:5, Insightful)

      by twistedsymphony (956982) on Monday June 11 2007, @08:16AM (#19464305) Homepage
      What ever happened to email validation?

      You give script your email address, it sends you an email and you follow a validation link within the email. Implementing this on my website where I had a captcha before got rid of 100% of the spam.

      There are also other little dirty tricks you can do to ensure it's a human on the other end, one of my favorites is to check the referrer URL when accepting a comment... if it's not being referred from my entry forum then it just happily throws the request away. Even if it's not spam it's probably something malicious anyway.

      Another thing I used to use that worked really well in conjunction with registration is "approving" any account in which the first post doesn't contain any links or any words on a "spam list". If the first post of the newly registered account contains any links or spam words at all, it's held for moderation and must be approved manually. A vast majority of the legit people leaving comments for the first time wont be including any links or talking about viagra on a tech site, no links or spam words means they've been validated as "not spam" and if they've included links it only takes a human a few seconds to qualify if the account should be canceled as spam or approved as a non-spam account. This one obviously takes some man power so it only really works on smaller sites. It might be easy for a spam bot to counteract this but the way it validates is not apparent, not to mention this is already after an email has been validated.
      • Re: (Score:3, Interesting)

        Get rid of the captcha by implementing the one verification scheme more annoying than a captcha! Good job!

        Email validation requires people to give you something -- their email address -- that may consider more valuable that the ability to post on your forum. You'll lose all those people, who are probably rather more numerous than those who would be turned away by an annoying captcha.

        In addition, email response is far more automatable than captchas. I am currently experimenting with an automated confirm-l
          • Re: (Score:3, Insightful)

            They way I look at it, if someone can't trust me with their email address then I can't trust them not to spam me.

            Get over yourself.

            If you're building a community forum where your visitors are likely to be repeat customers then IMO a more formal registration is appropriate.

            How many people do you really think come to your website thinking, "Today I am going to join a community!"? Joining a community is not something people carefully plan out doing, it's something that happens if they try it out for a while an
      • by Poromenos1 (830658) on Monday June 11 2007, @10:10AM (#19465723) Homepage
        I've found that not even this is necessary, I run a site with about 1000 visitors per day and the spam messages fell to zero when I included a field that said "Type in the box to prove you're human:".
        • Re:Alternative? (Score:4, Interesting)

          by cyphergirl (186872) on Monday June 11 2007, @11:46AM (#19466981) Homepage Journal
          My husband and I run a forum for homebuilt aircraft and we've already got bots doing this. We're using captchas at registration, an email activiation link AND we have to have a moderator personally approve every registration...... and we still have some spammers who get through. I'm really beginning to think that there is an army of them out there earning .01 per hour to actually read our site and create profiles that match our user base. Some of the spammers have gone as far as to create signature blocks stating which type of kit they are building and the tail number they've reserved from the FAA. The account gets approved and then we've got hundreds of V1@grA posts to clean up in the morning.

          I read an advertisement recently -- apparently someone is collecting the URLs of web forum signup pages and then selling them to the botnets. I was thinking that maybe we could come up with a way of randomizing the signup page URL so that it would only work when the link is actually clicked on, but never got around to it. And let's be honest -- they'd figure that out too. *sigh*
      • Re:Alternative? (Score:5, Insightful)

        by moranar (632206) on Monday June 11 2007, @07:58AM (#19464115) Homepage Journal
        Doesn't work well: a bot will be right 25% of the times, just by answering at random. And more pictures mean difficult layout, or small picture size. Plus, it becomes an undue hassle on real users.
          • Re: (Score:3, Informative)

            Now, with only 4 images, you have 1+1+4+4+6+6 = 22 different possible outcomes, while having the problem remain trivially easy for a human.
            Each image either shows or doesn't show a cat, so that are two possibilities. With 4 images that makes 2^4 = 16 possibilities. I don't know where you got "1+1+4+4+6+6" from, but it doesn't make any sense to me.

            (Or maybe I misinterpreted).
          • Re: (Score:3, Insightful)

            As the previous poster pointed out, your maths is wrong, and it's 16 possibilities. This means the spam bot just has to try 16 times instead of 1. It can easily do that if it wants to.

            Meanwhile, you have shut out all users who do not speak English well can can't figure out your instructions.
      • Nonsense. There are plenty of things humans are good at that computers are rubbish at. How about displaying four photographs with the question "which image contains a bottle?"

        Your search space wouldn't be large enough -- you can only have a limited number of photographs, since they have to be manually generated, and once the correct answers have been identified the captcha-breaking algorithm would reduce to "which image is closest to something in this set", a fairly trivial image-matching problem. This

  • Great idea (Score:3, Insightful)

    by grimdawg (954902) on Monday June 11 2007, @07:41AM (#19463997)
    What word did you have to type to prove you weren't a bot? A good sample might give us an insight into which words are used: why? I had to type 'interest' - which seems to have no real distinguishing feature.

    Are they chosen for any good reason, or are they completely arbitrary? Are there letters that bots have trouble with? Fonts? Who knows?

    The only thing that's sure is that every protection will eventually be broken.

    What's more, maybe if you can't solve a simple word puzzle, I don't want you registering at my site...
    • Re:Great idea (Score:5, Insightful)

      by Turn-X Alphonse (789240) on Monday June 11 2007, @07:44AM (#19464007) Journal
      So people with eye sight problems aren't welcome on your site then?

      I have perfect vision and I struggle to tell if some S/5/Zs are one of the letters. The fonts and distortion is getting worse and worse to the point where it's usually 2 or 3 attempts before I can get one correctly, purely because letters are so distorted in them these days.
      • Re:Great idea (Score:5, Insightful)

        by 0123456 (636235) on Monday June 11 2007, @08:05AM (#19464189)
        Indeed: these things are getting to be an appalling nuisance. If I see a site that use them I increasingly just say 'fuck it' and leave; particularly the sites that keep asking for another one every few pages.

        Meanwhile, having an automated system feed them to Chinese people on $0.50 an hour can't be too hard, and they'll have at least as good a chance of getting the correct result as I do.
        • Re: (Score:3, Interesting)

          Heh, I remember once having to enter some cryptic captcha string into a text field at rapidshare or some nameless file hosting service. I think the problem with it was there was no discrimination between O and zero, or something to that extent. Anyway, the captcha sucked so much I misread it three times, in which the site replied with "You are a bot!" and shut me out of the system. Funny way of showing appreciation and respect to customers.

          By the way - since I started typing on this subject - I run a coup

  • by sveinb (305718) on Monday June 11 2007, @07:46AM (#19464015)
    Ask the user to perform a task that only a computer is likely to succeed at, like factorizing a 6-digit number. If the user gives the right answer, and this is the cunning part: Then it's not a human!

    MAN, I feel clever some times.
    • Ask the user to perform a task that only a computer is likely to succeed at, like factorizing a 6-digit number. If the user gives the right answer, and this is the cunning part: Then it's not a human!
      Now you're discriminating against autistic savants [wikipedia.org] like Dustin Hoffman's character in Rain Man, in possible violation of disability discrimination acts in the United States, the United Kingdom, or other countries. See you in court.
  • Captcha too hard (Score:5, Insightful)

    by aepervius (535155) on Monday June 11 2007, @07:50AM (#19464049)
    OK, I am a bit shrotsighted, but still, some of the captcha are so garbled with bright color random pixel/forms while the font color of what was to be read was light gray/pink/blue on white background (and naturally distorted) that frankly I swore loudly while trying for the 5th time to enter the correct random combo of lower case, upper case and digits.

    I am not sure if a picture is better, but it is defintively a step forward if I don't have to spend 5 time retrying.
  • by escay (923320) on Monday June 11 2007, @07:51AM (#19464057) Journal
    I find some of the most cryptic captchas on the ticketmaster site. granted that the site deserves a stringent bot control given the risk of scalpers but some of their patterns border on the ridiculous. TFA mentions someone who achieved 25% success in deciphering those ticketmaster ones and I am thinking, "how does he do that?!"
  • by Anonymous Coward on Monday June 11 2007, @07:53AM (#19464077)
    I always get annoyed by captchas.. its like a forced human intelligence test.
    We know that humans are more intelligent than scripts, so I always thought it should be easier to test the lack of intelligence in scripts than proving intelligence in humans.

    For example just use a simple honeypot in a html form. Put a dummy input field in a form. You can hide the field with CSS/noscript tag or just mark it: "This field should be left intentionally blank" or something of that nature to make it more human friendly.

    Seeing that all form fields are generally blank, the spambot/script will fill your dummy field. On server side check if the field has data, ignore the submission. It would be a VERY intelligent script that could COMPREHEND the purpose of any particular html input field.

    my anonymous 2c
    • by jimstapleton (999106) on Monday June 11 2007, @07:57AM (#19464113) Journal
      have a random or semi random set of field names, with an associated "key" field. Use the key field to retrieve the field names of interest. Also have a "name" and "password" field set up so they are invisible to a normal user.

      Block any IP submitting a non-blank "name" or "password" field.
    • The problem is that the solutions are being coded for individual sites not one size fits all. A custom solution would have no problem with that system at all.
    • by CodeBuster (516420) on Monday June 11 2007, @01:28PM (#19468153)
      It would be a VERY intelligent script that could COMPREHEND the purpose of any particular html input field.

      Not really, considering that most of these scripts are targeted at large sites (yahoo, hotmail, etc) OR common site frameworks (PhpNuke, Drupal, Blogger, etc) where common hidden field input patterns would very quickly be tested and coded around by the script writers. The whole point of CAPTCHA in the first place was that it presented a random and dynamic test which was easy enough for users to solve (at least in theory) while hard enough to foil simple analysis by script. This might work on a small custom website where it is not worth the trouble of the script writers to code a version specifically for the hidden input pattern of your site, but this hidden field stuff was tried and failed on big sites even before CAPTCHA was in common use.
  • by rtobyr (846578) <toby.richards@net> on Monday June 11 2007, @07:57AM (#19464111) Homepage
    One day, everybody will have a digital ID. You know, the kind used to digitally sign e-mail. If you had to digitally sign your request to create an account with a certificate issued from a trusted CA, then using a bot creates the potential of the user having his digital certificate revoked.
  • by G4from128k (686170) on Monday June 11 2007, @08:06AM (#19464199)
    Between ever-better computer image recognition algorithms and cheap offshore labor, captchas are doomed. Morevoer, captcha's don't even solve the actual problem because the goal isn't to distinguish human from nonhuman, but to distinguish spammer from nonspammer. This means we need some mechanism to identify a registrant and be aware of their behavior.

    Why don't sites band together, share data on abusive registrants, and require each new registrant to provide "references" in the form of their logins to 3-5 other sites. A person with a normal online life could easily demonstrate a pattern of nonspammy behavior. People with no prior history might be placed on probation (their posts are reviewed and may not contain any link-like data). If a registrant posts spam they temporarily (or permanently) lose their accounts on that site and all connected sites.

    At some point in time, the only thing that will work is a system that tracks the identity behind the account, assigns a reputation and ostracizes miscreants.
  • Shamus Young (the creator of the "DM of the Rings") recently introduced a captcha on his site to deal with comment spam. In his post about using a captcha [shamusyoung.com] on his site, he notes that:

    ... I used to get many hundreds of spam a day. Traffic here has jumped up since then, and I wouldn't be at all surprised to find I'm getting a couple of thousand a day by this point. But all of them bounce off the CAPTCHA, and I never even see them. I only see a spam make it through about once every other week, and I'm betting the ones that do make it though are entered manually... In any case, these are really impressive results for a CAPTCHA with only one short phrase that never changes.

    Emphasis mine. He's running a fairly popular site, and using a captcha based off of a single, unchanging, three-character phrase. Just the presence of the captcha was enough to effectively eliminate his spam problem. The indication seems to be that just the presence of a captcha is enough to keep spam off of even a moderately popular site.

  • So rather than put the burden of proof on humans to prove they're not a machine, put the burden of proof on the machines to prove they're a human?

    Take your average HTML form:

    Rather than have 1 textbox for a field value, have 10. UserName1, UserName2, UserName3, etc.

    Use javascript to randomly assign one of them as visible. The rest are hidden from the user.

    On the server, watch to see which textbox is filled. Presumably, with decent enough javascript skills, and stupid enough bots, your humans will fill out what they see, which is the correct combination. The bots won't.

    Granted, this method can be defeated if the bot checks for field level visibility after the page finishes loading, but even then, with decent enough javascript, you can continue to provide unobtrusive checks to ensure that your user is real -- e.g., unless the bot is running a macro through a web browser itself, your onblur events probably won't be tripped. And so on.

    This puts a burden on the developers to come up with clever ways of defeating the bots, but in reality, that's where the battle is -- html application devs. vs spambot devs. Users shouldn't have to be dragged into the middle.
    • Especially with provisions of Section 508 [wikipedia.org] and the ADA [wikipedia.org] (and foreign counterparts) that ban discrimination against blind people, who use computers through screen readers that render text as speech or braille.

      some sites are including an audio option.
      examples are here [captcha.net] (under Guidelines > Accessibility) and here [accessibilityblog.com]

    • With the likes of BugMeNot.com, which people can use to distribute usernames and passwords for websites, there is little incentive to collectively continuously register.
      And bots operated by web sites that require registration can spider bugmenot and ban all accounts that are listed there.
      • Re: (Score:3, Insightful)

        No wonder the OCR software can't read them... I had to reload about 4 times before I could identify both words, and even then, I can't help wondering why they added the extra strike-through to make it even harder.