Stories
Slash Boxes
Comments

News for nerds, stuff that matters

HTML Encoded Captchas

Posted by kdawson on Mon Jan 01, 2007 08:03 AM
from the type-this dept.
rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.

Related Stories

[+] How to Prevent Form Spam Without Captchas 272 comments
UnderAttack writes "Spam submitted to web contact forms and forums continues to be a huge problem. The standard way out is the use of captchas. However, captchas can be hard to read even for humans. And if implemented wrong, they will be read by the bots. The SANS Internet Storm Center covers a nice set of alternatives to captchas. For example, the use of style sheets to hide certain form fields from humans, but make them 'attractive' to bots. The idea of these methods is to increase the work a spammer has to do to spam the form without inconveniencing regular users."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • I failed to see how this'll help (Score:5, Interesting)

    by Rosco P. Coltrane (209368) on Monday January 01 2007, @08:08AM (#17421490)
    At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...
  • Render, PrintScr, OCR? (Score:3, Interesting)

    by Frogular (961545) on Monday January 01 2007, @08:09AM (#17421492)
    Can't the bot simply render and OCR it?

    A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"
  • watermarking (Score:3, Interesting)

    by dattaway (3088) on Monday January 01 2007, @08:15AM (#17421506)
    (http://dattaway.us/)
    How about watermarking the captcha with the site's address and a short message?
  • by Timesprout (579035) on Monday January 01 2007, @08:15AM (#17421510)
    Anyone?
  • Bad form (Score:5, Insightful)

    by Zaph0dB (971927) on Monday January 01 2007, @08:24AM (#17421530)
    I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.
  • Spy vs spy (Score:1, Interesting)

    by Anonymous Coward on Monday January 01 2007, @08:33AM (#17421588)
    This scheme will work until it is widely enough used that it is worth the spammers' while to write a crack. As the author suggests, the ultimate solution is probably to have so many of these schemes that the spammers can't keep up.

    I have a question. How much of a problem are these spammed responses to blogs. I go to several blogs that don't have captchas and haven't noticed anything that could be called spam. Is this a response to a non-problem?
    • 1 reply beneath your current threshold.
  • 218kb (Score:1)

    by Joebert (946227) on Monday January 01 2007, @08:35AM (#17421598)
    Screw trying to solve it, it would be easier to use that 218kb chunk of junk that's no doubt going to need a bunch of dynamic processing against them, thus forcing them to wish they never used it in the first place.
  • workaround... (Score:5, Informative)

    by zozzi (576178) on Monday January 01 2007, @08:35AM (#17421600)
    Spammers already have a workaround for catchpas:

    1. Show the image in an alternate pornographic/warez/whatever website

    2. Ask the user to type it in to access the site

    3. Use the user's input to access the original protected site

    4. There is no step 4.

    • Re:workaround... by rjamestaylor (Score:2) Monday January 01 2007, @08:52AM
    • Re:workaround... by iangoldby (Score:1) Monday January 01 2007, @09:14AM
    • Re:workaround... by iamdrscience (Score:2) Monday January 01 2007, @09:31AM
      • Re:workaround... (Score:5, Funny)

        by Phillup (317168) on Monday January 01 2007, @10:31AM (#17422146)
        When it comes to porn, I'm no slouch and I can count the number of times I've seen sites that give you free access after entering a captcha on one hand.

        One hand eh?

        Guess we don't really need to ask how you know this...
        [ Parent ]
      • Re:workaround... by Dystopian Rebel (Score:2) Monday January 01 2007, @11:24AM
      • Re:workaround... by BCoates (Score:2) Tuesday January 02 2007, @03:26AM
    • 1 reply beneath your current threshold.
  • A captcha is still a captcha (Score:5, Interesting)

    by Cee (22717) on Monday January 01 2007, @08:37AM (#17421610)
    One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked".
    A HTML generated captcha would prevent that, since there is no image file to copy.
    However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.

    And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.

    In short, this hack is just another step in the arms race, that just buys us some time.
  • by Tim C (15259) on Monday January 01 2007, @08:54AM (#17421670)
    Really? Firefox doesn't seem to have any problems downloading and processing it, and as I wasn't aware that Firefox or Gecko used voodoo magic, I'm going to assume that the same would be true of any purpose-written code...

    It's a nice idea, but it's little more than a speed-bump at best. (And not a particularly high one, at that)
  • Screen Captcha! (Score:3, Interesting)

    by mrmeval (662166) <mrmeval@nOspAM.gmail.com> on Monday January 01 2007, @09:04AM (#17421702)
    It's easy no?

    The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.

    It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.

    Or maybe a gif file. I wonder how solid black or white compress......

  • Lunacy (Score:4, Interesting)

    by Stormx2 (1003260) on Monday January 01 2007, @09:05AM (#17421712)
    Lunacy! I've made apps which can do this sort of thing before, and this one is totally unoptimized! Take a look at this:

    With the limited amount of colours used, it would make much more sense to
    a) give the table an id, then:
    table.tabid td { width:1px; height:1px; )
    b) give some classes for each colour used
    td.colid { background-color: blah; }

    I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?
  • Processing (Score:2, Interesting)

    by jones_supa (887896) <jza@saunalahti.fi> on Monday January 01 2007, @09:06AM (#17421716)

    The Captcha is no longer an image and therefore not a resource they can download and process.

    Err...but the HTML captcha is a resource they can download and process.

    • Re:Processing by Phillup (Score:2) Monday January 01 2007, @10:34AM
  • When bad ideas go live (Score:1, Insightful)

    by billcopc (196330) <vrillco@yahoo.com> on Monday January 01 2007, @09:30AM (#17421792)
    (http://fnarg.com/)
    Having a 200kb block of text, no matter how well it compresses, will add anywhere from 10 to 40 seconds to download on a dial-up line, and that was for a ridiculously small CAPTCHA. A larger, more human-readable size might use up 500kb or more. Even on a high-speed link that's a noticeable pause. The fact that it only shows up on the sign-up page doesn't make it excuseable; in fact it makes it counter-productive. If I find some cool site, eagerly hit the sign-up link and end up staring at a half-rendered page for more than 15-20 seconds, I'll just leave and find some other site that loads faster, because I really don't care what's going on behind the scenes... I have no compassion for an elaborate security device if it bungles my experience.

    This is what happens when bad ideas are brought to life. This will only waste the site owner's bandwidth, maybe slow down the attacker slightly while the algorithm is modified.. we're talking AT MOST a couple days work. You could achieve the same result by adding a 2-second delay to the CAPTCHA cgi, the same idea as adding a delay to failed logins... if you can't properly defeat the attackers, at least slow them down.

    We've reached a point where, with security/copy protection, if it is something than can be done by a human sitting at a computer, the human can be removed from the equation. The greatest shortcoming of any system like CAPTCHA, or even asking "human intelligence" questions like "What do monkeys eat" or other things that computers don't innately "know", is that a human has to computerize those actions in the first place. You have to teach YOUR computer what the answer to the monkey question is, and there are only so many answers you will teach it until you run out of ideas (or exhaust the body of humankind's knowledge). Eventually the attacker will know all the answers to your challenges and you've just wasted a whole lot of time.

    A better strategy here is the psychological approach. How do you get rid of a tireless attacker ? What motivates an attacker ? They WANT something of value to them. That something can be email addresses, zombie hosts, or in the case of blog spam they just want eyeballs. There are two ways to demotivate them: get rid of what's luring them, or make your prize harder to get than everyone else's. The first solution might mean crippling your site, even making it totally worthless (think site owners that give up, communities that are abandoned after relentless attacks). The second solution only buys you time, because the more vulnerable sites will ramp up their security, sooner or later, and then you're back at square one.

    Actually there is a solution 3: find the attackers and attack THEM. Hey it's not the higher road, but it's damn effective.
  • Captcha's are annoying (Score:5, Insightful)

    by tacocat (527354) <tallison1@ t w m i . r r.com> on Monday January 01 2007, @09:31AM (#17421798)

    While this has little to do with the original post I have a really annoying experience with captchas

    I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.

    It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.

    Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.

    It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly responding client and not some hacked up bot that can't do most of it right.

  • Broken (Score:5, Interesting)

    by Kurayamino-X (557754) <Kurayamino@graff ... t minus math_god> on Monday January 01 2007, @09:36AM (#17421824)
    All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.
    • Re:Broken by springbox (Score:2) Monday January 01 2007, @11:05AM
      • Re:Broken by izam_oron (Score:1) Monday January 01 2007, @11:56AM
        • Re:Broken by eugene ts wong (Score:1) Monday January 01 2007, @01:10PM
          • Re:Broken by izam_oron (Score:1) Thursday January 04 2007, @06:01PM
            • Re:Broken by eugene ts wong (Score:1) Thursday January 04 2007, @10:26PM
      • Re:Broken by eugene ts wong (Score:1) Monday January 01 2007, @01:16PM
    • Re:Broken by toddestan (Score:2) Monday January 01 2007, @02:07PM
      • 1 reply beneath your current threshold.
    • Make the question a wav file by cheekyboy (Score:2) Monday January 01 2007, @03:08PM
    • Re:Broken by spitzak (Score:2) Monday January 01 2007, @04:12PM
    • Re:Broken by nuzak (Score:2) Monday January 01 2007, @06:06PM
    • 1 reply beneath your current threshold.
  • This is going to be effective only as long as it is not popular and not worth somebody's time to sit down and write a script to convert it into a genuine image.

    How difficult is it to translate this matrix into a normal image? Not very difficult I am afraid.
    • 1 reply beneath your current threshold.
  • 218k of junk (Score:3, Informative)

    by suv4x4 (956391) on Monday January 01 2007, @10:30AM (#17422132)
    This GPL-ed project can be reproduced by a junior coder in an hour so the fact it's GPL-ed I guess isn't of so much help.

    Also on the subject of it being 218k, each pixel looks like:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    which is badly redundant, the very first thing is you can make all "td"-s in the table be 1px/1px with a simple: table.captcha td {width:1px; height:1px} rule, then background-color can be shortened to just "background" and still be valid.

    Furthermore you don't need table with rows and columns, if you float the pixels to left, then you only need a container of the right width and columns/rows wil naturally form, to keep it down we can style a shorter tag for our purposes, like <b>

    So at this stage we arrive at the much simpler:

    <b style="background:#abcdef"></b>

    But this can be simplified even further by indexing the colors used as around a 40-50 css classes (fiven the image has a lot more than 40-50 pixels and 40-50 colors are enough for it, it's still a net gain), for example: .cA {background:#abcdef} .cB {background:#ffaabb}, at which point we get not only more obfuscation for the captcha crackers to solve, but much lighter code:

    <b class="cA">&lt/;b>

    and again the original:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    And this is before we start putting JavaScript in the picture...
  • Congratulations... FOOL! (Score:2, Interesting)

    by Tom (822) on Monday January 01 2007, @10:49AM (#17422256)
    (http://web.lemuria.org/)
    Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

    Congratulations. How much did they pay you?

    Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.
  • by sbben (983577) on Monday January 01 2007, @11:34AM (#17422534)
    What the HEC?
  • No need to download the image (Score:5, Interesting)

    There's no need to download the image. Look at the source. Somewhere it says: <input type="hidden" name="hash" value="ad6ade8a0b6e2f748b80a390ff45cf31">

    Now, just go to MD5Lookup.Com [md5lookup.com] and convert that little "hidden" MD5Sum back to the original text:

    ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB

    Maybe the author should add some salt. :-)
  • by Skylinux (942824) on Monday January 01 2007, @12:26PM (#17422892)
    (http://www.workorderts.com/)
    I just wrote a sample CAPTCHA system as well but kept in black and white for various reasons. I also use whole words to make text input simpler for humans. Here is a competing article explaining a different approach.
    ahref=http://www.network-technologies.org/Projects /Virtual_Brain_Online/article/user_validation_imag e_verification_code_captcha/rel=url2html-31631 [slashdot.org]http ://www.network-technologies.org/Projects/Virtual_B rain_Online/article/user_validation_image_verifica tion_code_captcha/>
  • Captchas (Score:1)

    by anasciiman (528060) on Monday January 01 2007, @12:35PM (#17422960)
    I loathe CAPTCHAs in any form. People who are blind or, like me, totally colourblind have no functional means to figure out just what in the hell we're supposed to "see" in them. Some places have noted this and added audio versions but the majority of sites using CAPTCHAs do not bother. For those webmasters thinking of using CAPTCHAs - there are a lot of us out here who are visually disabled and, um, we have money to spend with your competition if you can't at least meet us halfway. </rant>
    • Re:Captchas by /dev/trash (Score:2) Monday January 01 2007, @02:20PM
    • Re:Captchas by linuxfanatic1024 (Score:1) Wednesday January 03 2007, @02:23PM
  • Visually impaired? (Score:1)

    by cojsl (694820) on Monday January 01 2007, @12:38PM (#17422976)
    (http://www.i-t-w.com/)
    A quick skim of TFA didn't indicate whether these are any more accessible for the visually impaired. The current "audio captcha" option help, but standard captchas are a real barrier to the visually impaired.
  • by TehBeer (860440) on Monday January 01 2007, @12:38PM (#17422980)
    BitBloating the pixels to individual 1X1 TD tags will hardly make a difference. All that is needed is to reconstruct the bitmap then use OCR on it as usual.

    What a sad solution.
  • by sugarmotor (621907) on Monday January 01 2007, @02:12PM (#17423706)
    (http://stephan.sugarmotor.org/)
    The page compresses with gzip from 188,398 bytes down to 13,326 bytes. In plain text it displays ca. 5,000 bytes.

    So with HTML compression the size of this encoding isn't really a problem.

    But as mentioned at http://en.wikipedia.org/wiki/Captcha [wikipedia.org] the real hurdle is that the opponent can use low-paid data entry workers: http://it.slashdot.org/article.pl?sid=06/09/06/121 7240 [slashdot.org] "Will Solve Captcha for Money?"

    Stephan
  • by MoogMan (442253) on Monday January 01 2007, @03:59PM (#17424722)
    "Capchas" and similar technology are just DRM. Thankfully, the audience trying to crack the former are far more stupid than the audience that crack DRM.
  • by Sheepeep (994464) on Monday January 01 2007, @04:11PM (#17424844)
    Although not technically an "image", it's still an image-based solution. If I wrote a CAPTCHA using SVG, it'd still be an image, even if it's a markup language. If I wrote it in Flash, it'd still be an image-based solution, even if it's Flash. I also don't see what would be difficult about automatically singling out an enormous, single-lined (in source) table full of CSS declarations without any 'data'. In fact, it's probably easier to spot as a script than an image...Probably with a similar time to decode it.
  • If you absolutely must use something like this, you can easily confuse spambots (and with far less code!) by interspersing some elements containing the CAPTCHA text itself and making them contiguous on the screen using absolute positioning. Such a thing is an accessibility nightmare, but no worse than the technique in the article.
  • by IBitOBear (410965) on Tuesday January 02 2007, @04:17AM (#17429952)
    (http://www.pobox.com/~rwhite)
    ...they don't know the difference between a "DOS Attack" and a simple slashdotting... 8-)
  • by OhHellWithIt (756826) on Tuesday January 02 2007, @08:45AM (#17430962)
    (Last Journal: Wednesday August 08, @12:54PM)
    The page has been taken down. The author says it was subject to a DoS attack. I guess that's what /. readers are, eh?
  • by Thundersnatch (671481) on Tuesday January 02 2007, @09:03AM (#17431066)
    What you wrote had no positive value, and was moderated accordingly. We are all dumber for having read it.
    [ Parent ]
  • 11 replies beneath your current threshold.