Slashdot Log In
HTML Encoded Captchas
Posted by
kdawson
on Mon Jan 01, 2007 08:03 AM
from the type-this dept.
from the type-this dept.
rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots:
HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.
Related Stories
[+]
How to Prevent Form Spam Without Captchas 272 comments
UnderAttack writes "Spam submitted to web contact forms and forums continues to be a huge problem. The standard way out is the use of captchas. However, captchas can be hard to read even for humans. And if implemented wrong, they will be read by the bots. The SANS Internet Storm Center covers a nice set of alternatives to captchas. For example, the use of style sheets to hide certain form fields from humans, but make them 'attractive' to bots. The idea of these methods is to increase the work a spammer has to do to spam the form without inconveniencing regular users."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
I failed to see how this'll help (Score:5, Interesting)
Re:I failed to see how this'll help (Score:5, Insightful)
(http://www.omgili.com/ | Last Journal: Saturday July 08 2006, @12:39PM)
It's not that simple. Since the Captcha is no longer an image that you can download, the bot will first has to locate the position of the Captcha. The owner of the site can modify the layout of the page and Captcha making it unique. By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.
The problem today is with automated software that download the Captcha images from a pre-defined location (URL) and crack them. HECs makes it much harder to locate this resource.
Oh and everything is Crackable;)
Re:I failed to see how this'll help (Score:4, Interesting)
Add in the huge size of the html and the annoyance factor of captchas in general, and this is amazingly stupid.
Re:I failed to see how this'll help (Score:4, Insightful)
Yes, I see that they recommend adding in random divs and crap. If it's still a table, it's still very very easy to parse, even without a parser. If they intend for you to replace the table with 'random elements'
Render, PrintScr, OCR? (Score:3, Interesting)
A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"
watermarking (Score:3, Interesting)
(http://dattaway.us/)
What are the gotchas with these captchas (Score:2)
Re:What are the gotchas with these captchas (Score:5, Insightful)
Bad form (Score:5, Insightful)
Re:Bad form (Score:4, Informative)
(http://pietersz.co.uk/ | Last Journal: Wednesday May 04 2005, @05:22AM)
Spy vs spy (Score:1, Interesting)
I have a question. How much of a problem are these spammed responses to blogs. I go to several blogs that don't have captchas and haven't noticed anything that could be called spam. Is this a response to a non-problem?
218kb (Score:1)
workaround... (Score:5, Informative)
1. Show the image in an alternate pornographic/warez/whatever website
2. Ask the user to type it in to access the site
3. Use the user's input to access the original protected site
4. There is no step 4.
Re:workaround... (Score:5, Funny)
One hand eh?
Guess we don't really need to ask how you know this...
A captcha is still a captcha (Score:5, Interesting)
A HTML generated captcha would prevent that, since there is no image file to copy.
However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.
And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.
In short, this hack is just another step in the arms race, that just buys us some time.
Not a resource they can download and process? (Score:2)
It's a nice idea, but it's little more than a speed-bump at best. (And not a particularly high one, at that)
Screen Captcha! (Score:3, Interesting)
The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.
It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.
Or maybe a gif file. I wonder how solid black or white compress......
Lunacy (Score:4, Interesting)
With the limited amount of colours used, it would make much more sense to
a) give the table an id, then:
table.tabid td { width:1px; height:1px; )
b) give some classes for each colour used
td.colid { background-color: blah; }
I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?
Processing (Score:2, Interesting)
The Captcha is no longer an image and therefore not a resource they can download and process.
Err...but the HTML captcha is a resource they can download and process.
When bad ideas go live (Score:1, Insightful)
(http://fnarg.com/)
This is what happens when bad ideas are brought to life. This will only waste the site owner's bandwidth, maybe slow down the attacker slightly while the algorithm is modified.. we're talking AT MOST a couple days work. You could achieve the same result by adding a 2-second delay to the CAPTCHA cgi, the same idea as adding a delay to failed logins... if you can't properly defeat the attackers, at least slow them down.
We've reached a point where, with security/copy protection, if it is something than can be done by a human sitting at a computer, the human can be removed from the equation. The greatest shortcoming of any system like CAPTCHA, or even asking "human intelligence" questions like "What do monkeys eat" or other things that computers don't innately "know", is that a human has to computerize those actions in the first place. You have to teach YOUR computer what the answer to the monkey question is, and there are only so many answers you will teach it until you run out of ideas (or exhaust the body of humankind's knowledge). Eventually the attacker will know all the answers to your challenges and you've just wasted a whole lot of time.
A better strategy here is the psychological approach. How do you get rid of a tireless attacker ? What motivates an attacker ? They WANT something of value to them. That something can be email addresses, zombie hosts, or in the case of blog spam they just want eyeballs. There are two ways to demotivate them: get rid of what's luring them, or make your prize harder to get than everyone else's. The first solution might mean crippling your site, even making it totally worthless (think site owners that give up, communities that are abandoned after relentless attacks). The second solution only buys you time, because the more vulnerable sites will ramp up their security, sooner or later, and then you're back at square one.
Actually there is a solution 3: find the attackers and attack THEM. Hey it's not the higher road, but it's damn effective.
Captcha's are annoying (Score:5, Insightful)
While this has little to do with the original post I have a really annoying experience with captchas
I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.
It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.
Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.
It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly responding client and not some hacked up bot that can't do most of it right.
Broken (Score:5, Interesting)
not very effective (Score:1)
(http://www.thoughtfulchaos.com/)
How difficult is it to translate this matrix into a normal image? Not very difficult I am afraid.
218k of junk (Score:3, Informative)
Also on the subject of it being 218k, each pixel looks like:
which is badly redundant, the very first thing is you can make all "td"-s in the table be 1px/1px with a simple: table.captcha td {width:1px; height:1px} rule, then background-color can be shortened to just "background" and still be valid.
Furthermore you don't need table with rows and columns, if you float the pixels to left, then you only need a container of the right width and columns/rows wil naturally form, to keep it down we can style a shorter tag for our purposes, like <b>
So at this stage we arrive at the much simpler:
<b style="background:#abcdef"></b>
But this can be simplified even further by indexing the colors used as around a 40-50 css classes (fiven the image has a lot more than 40-50 pixels and 40-50 colors are enough for it, it's still a net gain), for example:
<b class="cA"></;b>
and again the original:
And this is before we start putting JavaScript in the picture...
Congratulations... FOOL! (Score:2, Interesting)
(http://web.lemuria.org/)
Congratulations. How much did they pay you?
Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.
This will leave spammers saying.... (Score:1)
No need to download the image (Score:5, Interesting)
(http://wilmer.gaast.net/)
Now, just go to MD5Lookup.Com [md5lookup.com] and convert that little "hidden" MD5Sum back to the original text:
ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB
Maybe the author should add some salt.
I just wrote one.... (Score:1)
(http://www.workorderts.com/)
ahref=http://www.network-technologies.org/Project
Captchas (Score:1)
Visually impaired? (Score:1)
(http://www.i-t-w.com/)
BitBloating the pixels to individual TD tags (Score:1)
What a sad solution.
Size is not so bad./ (Score:1)
(http://stephan.sugarmotor.org/)
So with HTML compression the size of this encoding isn't really a problem.
But as mentioned at http://en.wikipedia.org/wiki/Captcha [wikipedia.org] the real hurdle is that the opponent can use low-paid data entry workers: http://it.slashdot.org/article.pl?sid=06/09/06/12
Stephan
Captchas synonymous to DRM? (Score:2)
Seems Almost Misleading (Score:1)
Cleaner way (Score:2)
(http://www.hiregeeks.com/)
How smart can they be if... (Score:2)
(http://www.pobox.com/~rwhite)
Nothing here to see (Score:1)
(Last Journal: Wednesday August 08, @12:54PM)
Re:But was it interesting to the /. audience? (Score:2)