Hell Yeah! reminds us of a 2-week-old development that somehow escaped notice here. A team of Russian hackers has found a way to decipher a Yahoo CAPTCHA, thought to be one of the most difficult, with 35% accuracy. The Russian group's notice, posted by one "John Wane," is dated January 16. This site hosts a rapidshare link to what looks to be demonstration software for Windows, and quotes the Russian researchers: "It's not necessary to achieve high degree of accuracy when designing automated recognition software. The accuracy of 15% is enough when attacker is able to run 100,000 tries per day, taking into the consideration the price of not automated recognition — one cent per one CAPTCHA."
I think the parent is serious. The idea is that your robot goes and grabs the images that needs to be decoded. Then on another website, it is presented and you can see free porn if you type in the word. I've heard of this but never read about it. Sounds like a good idea. Anyone know what this is called or some references ?
http://news.bbc.co.uk/2/hi/technology/7067962.stm [bbc.co.uk] Here is a link to a BBC article about something like that. It's a Windows program that rewards typing in captchas by showing a woman that takes off progressively more and more clothes.
that's why it costs 1 cent per 1 captcha, the overall cost of webhosting the porn for exchange boils down to 1 cent per solved captcha. obviously, if you're hosting on root-kited windows boxes in the us (the highest rate of infection is in the us) the cost is still about 1 cent per one captcha because the cost of paying hackers to keep a bot net sizable enough comes to about the same cost.
especially with sp3 coming out now, the cost of bot nets is higher, since sp3 offers a 'easy' bot net removal path, since staying off-line long enough to get all sp2's flaws patched is crucial in preventing reinfection. believe me, having a root-kit installed is easy even for a veteran computer guy to miss.
i have dvd's i burned almost 3 years ago that reinfect any windows machine with a root-kit, and are un-readable in linux, apparently the root-kit was using some hooks in nero burning rom to 'randomly' pick a burn project and put the root-kit installer on there so when windows tried to auto run it would install the root-kit, then show the 'window' that normally shows up on auto-run would show up. the rootkit took an 'extra' session, that was transparent, eg: it would only show using burning software to read the track data, for the burned cd or dvd. no additional files showed up in windows, but the extra session made it unreadable to linux.
also, the root-kit only runs in a 'blank' screen saver, which it protects and makes sure loads when the system is idle, so it never sends data when the user might be there to notice. and i think it sends the data as like, internet explorer, to bypass firewall rules. since none of the firewalls i tried could block it. i actually only found the original root kit when a second root-kit moved the first root-kit's files to the recycle bin. other than that none of the root kit scanners that were recommended to me could even detect this thing. only the 'symptoms' and the fact i could 'remove them' by staying off-line and not using my old discs were proof that i had a root kit.
symptoms included, auto-run becoming disabled, screen saver always resetting to 15 minutes (only when both root-kits were on there), and the 'desktop' showing up 2-3 times a day when in full-screen games (also only with both root kits), and finding root-kit files in recycle bin(only found on networked systems with the root kit, and didn't return on reinstall of both root-kit, likely was a 1 time 'bug' that was fixed later on)
so yeah, I didn't notice it for 3 years. Not that i usually have to deal with virus, but in the past I had only ever had to deal with 3 virus and in my 15 years online. and the third one was really a root-kit. I've also been using open-source software for 11 years, so that probably helped, of course, one of the virus was one that affected my open source software, the other 2 were windows based.
it's still easy to miss windows root-kit's nowadays, especially when hackers have root-kits that aren't published, and they use scripts to make the exe's have unique signatures (using compiler tricks) for known root-kits.
that's why it costs 1 cent per 1 captcha, the overall cost of webhosting the porn for exchange boils down to 1 cent per solved captcha.
Er, where did you get that number? At Nearly Free Speech [nearlyfreespeech.net], it only costs $1 / GB (of transfer), and that's how much it would cost nearly anywhere else (or even less!), if you use significant amount of bandwidth.
I don't know exactly how large porn images are, never having looked at them, but if you guess a round number of 0.1 MB per picture, it's only about $0.0001, or 0.01 cent per captcha. I suppose it's better than nothing, but it's not yet very cost-prohibitive.
The 3D captcha [spamfizzle.com] seems to be a good solution here (that's a link from wikipedia article [wikipedia.org])
You pick several 3d models, like people, chairs or flowers. Name all their parts, like "chair leg", "human head" etc. The CAPTCHA is generated by placing a several 3D models randomly rotated on a scene and rendering them with easily readable letters "A", "B" placed on the named parts. The captcha questions are: "what is the letter on human head", "what is the letter on chair leg", etc..
People can answer pretty easily. The 3D models are always randomly placed and rotated on a scene, so bots have a problem.
by Anonymous Coward
on Tuesday January 29 2008, @07:49PM (#22229446)
A few months ago Yahoo introduced a CAPTCHA to prevent bots entering their chatrooms. Within a few days every room on yahoo was filled with bots once more, and still are to this day.
Given the current situation of the chat rooms on yahoo, it comes as no suprise at all that the other parts of the Yahoo system are inadequately protected from bots either.
Heh, yeah. . ..I used to hook up my computer using Rybka to yahoo chess. I played against other bots, other players(always a glorious win), and tolerated the unending spam from other bots that would just want you to go to some porn website. Eventually, they instituted a CAPTCHA. ..Oh noes, my bot was broken. Turns out I could just manually enter the CAPTCHA and grab the session ID info before the applet loaded and forward that manually to the bot. Once I'm "logged in" with the bot, it's no big deal. Point
What other tough AI problems can we foist onto spammers? People who buy V1agra through email ads could be the single largest source of computer science research "grants."
That reminds me of the age check for Leisure Suit Larry back in the day... Who knew that the desire of a horny teen to see pixellated boobs would lead to history research?
To register, answer these questions and click the button on the right What colour are buses in London? What is three times three? [Red] [Green] [Blue]
Yes, those are undoubtedly hard questions for a computer. How, exactly, do you plan to generate billions of these questions? For a CAPTCHA to work, it must still be hard even if the generation algorithm is public knowledge.
Not really. After a couple of (thousand) runs through, the attacker would have a reasonably accurate database of the questions. They can then analyze the text to find the nearest match to one of the questions in its database.
Not really. After a couple of (thousand) runs through, the attacker would have a reasonably accurate database of the questions. They can then analyze the text to find the nearest match to one of the questions in its database.
That's true. I've found, however, that introducing custom spam blocking methods, such as this, no matter how easy to break, often does a better job at stopping spam bots than more robust publicly available methods. For a target as big as Yahoo, this probably won't work, but I've found on PHPbb for instance, instead of using any of the publicly available captchas, which are easily defeated by bots, creating a simple question of this sort does wonders for bot-blocking. Even if it's just one question. If your site isn't big enough to be specifically targeted by bot farmers, sometimes a simple solution is better than a more complex one that everybody else is using.
That has been my experience, too. I admin a small bb and was having horrible problems with spam sign ups. CAPTCHAs didn't slow the spammers down at all. I went to a simple question that will be easily known by all of my target audience but probably won't be known by someone half way around the world entering CAPTCHAs for a penny a piece and allowed any spelling that is even close. I haven't had any spammers sign up for a couple years now.
That obviously won't work for a major target like YAHOO though.
I have a little site, only really intended to share stuff with family and friends, served with custom scripts. I couldn't believe it when it was targetted by spammers. I could even see the test posts they made, checking to see if html was allowed etc., before unleashing the the bot to post dozens of links a day.
(if anyone uses this and makes a million, at least cut me in 10% for the idea) I gather the last frontier for computers is image recognition. I'm not sure of the state of image processing, but if you could randomly color simple pictures (one flower, one pen, one cup (NO PUN INTENDED)) into about twenty different shades, and get about a hundred different photos, and just start rotating two or three a week in. So the user sees a small photo with radio boxes below:
Just put some hard to read perl code in there and ask the user to say what it does. If the answer is correct it's a bot, if the answer is wrong it's probably a human;)
The letters are too far away from each other - makes it easy to separate them for proccessing. In fact, the only challenging aspect for OCRs in your captcha is the letter rotation/skewing.
However, I don't think anyone will bother to write a captcha OCR for your site, unless it's Yahoo sized.
The character outlines are nicely distinct, which means that even basic OCR software should be able to break the CAPTCHA. Since it's so easy to break, you want to hide it from any bots that come by: remove all references to "captcha" from the page source, and you might want to move the HTML for the image away from the HTML for the entry box.
Although it seems counter-intuitive, character recognition (even with your filtering) is a relatively easy problem for a computer to solve. The hard problem is segmentation. It is relatively easy for a human to segment characters when they are somehow joined together, by artifacts or occlusion, it can be very hard to do with current methods.
Hence all good modern captchas have moved away from character recognition captchas (such as yours) to segmentation based captchas. You only need to read the wikipedia article on CAPTCHAs to see some examples: http://en.wikipedia.org/wiki/Captcha [wikipedia.org].
I've found Yahoo's CAPTCHA to be really annoying. I probably get it wrong about 20% of the time because the picture is so distorted (and I've been surprised that I got it right a lot of the time). I even considered writing them an email complaining about it, but then I realized they probably don't give a crap.
We hate CAPTCHA. Most thing they do to make it difficult for computers to decode, make it a lot more difficult for humans to decode. Most of them are not usable by text browsers (dah), and the blind. Some have audio that is hard for people to hear, and sill easy for computer to decode. Last, CAPTCHA's are so over used that people just do them without thinking. For all you know that Porn/ware site is using you to do CAPTCHA for them. Not that it is needed. This is just one more nail in the CAPTCHA coffin.
33% of Yahoo capitchas isn't really impressive - you still get a large quantity of negative hits, and unless you have an array of IP addresses (most people don't), there will still be a large quantity of addresses registered from a given IP. Also, a large quantity of negatives would cast doubt on any positive matches from the same IP.
Also, Yahoo captchas aren't that "hard" - they are black text from known font pools on a white background that get slightly warped and have black lines drawn on some characters. This is hardly strong since it doesn't hit all letters within the word (which is done by reCAPTCHA) or use a large font-pool variety.
Even the Slashdot Captcha is harder - it hits the whole image and uses different fonts within the word.
33% of 100,000 attempts per day is 33,000 posts per day.
That also has 67,000 failed captchas per day - something you generally notice. If your captcha system detects rapid-fire captcha attempts (requests, failed, etc), you can auto-block the IP address that is making that many requests.
You'd probably want to do that anyway, since 1.15 requests per second for captchas is on par with flooding.
This might account for the recent increase in spam chat messages I've been seeing there. My guess is that the spam filtering is not as effective on chat as email. Indeed, chat may not pass through any kind of filtering at all afaik. That will probably change soon, but in the meantime I suppose the people who cracked the captcha will make a tidy profit.
I agree, that is better than I normally do as well. Maybe someone could make this a firefox plugin so that mere mortals can actually access webpages that use CAPTCHAs.
It is sad because with corrective lenses, my vision is 20/20, and I'm highly technical. I should not have any problems with CAPTCHAs; However, my grandmother is another story. She has poor vision, can't figure out how to do a carriage return on her computer, has difficulty understanding the concept of scrollbars, and I'm sure would not be able to deal with even the easiest CAPTCHAs in use today. This is not usability. Granted, given the choice between SPAM or CAPTCHAs, I'll chose the lesser of the two evils...
This is why you need a queryable, updateable public spam database like Akismet [akismet.com] where, with a little effort in telling it the odd time it gets it wrong, you can eliminate 99% of spam. This might not help for a registration script, but you could use it on the content ultimately used by the registered user to determine whether the signup was likely a bot or a human.
Did anyone notice that the image recognition code is imported from a binary DLL? I was under the impression that the Russian hackers would provide the source for the recognition code as well. But then, the people who released this are only interested in generating as much spam. Why should you trust them? You would be foolish enough to _not_ execute your test program that imports this dll in a vmware instance instead of your actual machine. Anybody done a comprehensive strace to determine sockets/descriptors opened by using this dll?
Yahoo!'s captcha has been hacked, perhaps not as well, in the past. I've seen open http proxies pounding away at Yahoo to the tune of 100,000 per hour and more. Hotmail's is broken, so are others. The real shame is that the Storm Worm controllers are being protected by a national government and law enforecement system.
So what's the answer?
I'm sure I don't know. I do know that the wild west theory of accepting any kind of behaviour isn't acceptable. I know that some minimum standard of what's allowed and what isn't is going to have to take place. Where these limits are placed is a thing for a global conversation, and there will be differances of opinion.
Is cracking a captcha acceptalbe? Is phishing and identity theft acceptable? Is fraud and uncontrolled spam acceptable? What limits, and on what actions?
I'm just not that smart. But I think we can agree on a few things. Let's start to find out what those things are... and acting in concert with other network operators to enforce those standards. Fail to meet them, and your network routing gets dropped...
Are you bashing MS just to bash them. Honestly, their so called 'stupid system' is the best thing I've seen out there. Please enlighten me wise one, and link me to a better alternative.
p.s. How do you know that Gmail accounts haven't been hacked into? Do you have data validating this?
It's not a challenge to bash MS, that comes way to easy, but to add some useful content to/. , might be a challenge for yourself, wise one.
I thought those things were already broken (Score:5, Funny)
Re:I thought those things were already broken (Score:5, Insightful)
Parent
Re: (Score:3, Insightful)
Re:I thought those things were already broken (Score:4, Informative)
Here is a link to a BBC article about something like that. It's a Windows program that rewards typing in captchas by showing a woman that takes off progressively more and more clothes.
Parent
Re: (Score:3, Informative)
Re:I thought those things were already broken (Score:5, Interesting)
especially with sp3 coming out now, the cost of bot nets is higher, since sp3 offers a 'easy' bot net removal path, since staying off-line long enough to get all sp2's flaws patched is crucial in preventing reinfection. believe me, having a root-kit installed is easy even for a veteran computer guy to miss.
i have dvd's i burned almost 3 years ago that reinfect any windows machine with a root-kit, and are un-readable in linux, apparently the root-kit was using some hooks in nero burning rom to 'randomly' pick a burn project and put the root-kit installer on there so when windows tried to auto run it would install the root-kit, then show the 'window' that normally shows up on auto-run would show up. the rootkit took an 'extra' session, that was transparent, eg: it would only show using burning software to read the track data, for the burned cd or dvd. no additional files showed up in windows, but the extra session made it unreadable to linux.
also, the root-kit only runs in a 'blank' screen saver, which it protects and makes sure loads when the system is idle, so it never sends data when the user might be there to notice. and i think it sends the data as like, internet explorer, to bypass firewall rules. since none of the firewalls i tried could block it. i actually only found the original root kit when a second root-kit moved the first root-kit's files to the recycle bin. other than that none of the root kit scanners that were recommended to me could even detect this thing. only the 'symptoms' and the fact i could 'remove them' by staying off-line and not using my old discs were proof that i had a root kit.
symptoms included, auto-run becoming disabled, screen saver always resetting to 15 minutes (only when both root-kits were on there), and the 'desktop' showing up 2-3 times a day when in full-screen games (also only with both root kits), and finding root-kit files in recycle bin(only found on networked systems with the root kit, and didn't return on reinstall of both root-kit, likely was a 1 time 'bug' that was fixed later on)
so yeah, I didn't notice it for 3 years. Not that i usually have to deal with virus, but in the past I had only ever had to deal with 3 virus and in my 15 years online. and the third one was really a root-kit. I've also been using open-source software for 11 years, so that probably helped, of course, one of the virus was one that affected my open source software, the other 2 were windows based.
it's still easy to miss windows root-kit's nowadays, especially when hackers have root-kits that aren't published, and they use scripts to make the exe's have unique signatures (using compiler tricks) for known root-kits.
Parent
Re:I thought those things were already broken (Score:4, Informative)
I don't know exactly how large porn images are, never having looked at them, but if you guess a round number of 0.1 MB per picture, it's only about $0.0001, or 0.01 cent per captcha. I suppose it's better than nothing, but it's not yet very cost-prohibitive.
Parent
Re: (Score:3, Funny)
Posting on
Re:I thought those things were already broken (Score:4, Funny)
Parent
Hey (Score:5, Funny)
Re:Hey (Score:5, Interesting)
You pick several 3d models, like people, chairs or flowers. Name all their parts, like "chair leg", "human head" etc. The CAPTCHA is generated by placing a several 3D models randomly rotated on a scene and rendering them with easily readable letters "A", "B" placed on the named parts. The captcha questions are: "what is the letter on human head", "what is the letter on chair leg", etc..
People can answer pretty easily. The 3D models are always randomly placed and rotated on a scene, so bots have a problem.
Parent
Not really news (Score:5, Insightful)
Given the current situation of the chat rooms on yahoo, it comes as no suprise at all that the other parts of the Yahoo system are inadequately protected from bots either.
Re: (Score:3, Interesting)
Gentlemen, start your spambots (Score:2)
Re:Gentlemen, start your spambots (Score:4, Insightful)
To register, answer these questions and click the button on the right
What colour are buses in London?
What is three times three?
[Red] [Green] [Blue]
Parent
Re:Gentlemen, start your spambots (Score:5, Funny)
Parent
Bellybutton (Score:2)
Re: (Score:2)
Um, don't ask how I know that. >.>
Re:Gentlemen, start your spambots (Score:5, Insightful)
What colour are buses in London?
What is three times three?
[Red] [Green] [Blue]
Yes, those are undoubtedly hard questions for a computer. How, exactly, do you plan to generate billions of these questions? For a CAPTCHA to work, it must still be hard even if the generation algorithm is public knowledge.
Parent
Re:Gentlemen, start your spambots (Score:4, Insightful)
Parent
Re: (Score:3, Insightful)
Re:Gentlemen, start your spambots (Score:5, Insightful)
That's true. I've found, however, that introducing custom spam blocking methods, such as this, no matter how easy to break, often does a better job at stopping spam bots than more robust publicly available methods. For a target as big as Yahoo, this probably won't work, but I've found on PHPbb for instance, instead of using any of the publicly available captchas, which are easily defeated by bots, creating a simple question of this sort does wonders for bot-blocking. Even if it's just one question. If your site isn't big enough to be specifically targeted by bot farmers, sometimes a simple solution is better than a more complex one that everybody else is using.
Parent
Re: (Score:3, Interesting)
Re: (Score:3, Informative)
Random Coloration Photos (Score:3, Interesting)
I gather the last frontier for computers is image recognition. I'm not sure of the state of image processing, but if you could randomly color simple pictures (one flower, one pen, one cup (NO PUN INTENDED)) into about twenty different shades, and get about a hundred different photos, and just start rotating two or three a week in. So the user sees a small photo with radio boxes below:
The cup is ()red ()blue ()green ()purple ()oran
Re:Random Coloration Photos (Score:4, Insightful)
Parent
Re: (Score:3, Insightful)
What about i18n? (Score:3, Informative)
Re: (Score:3, Funny)
kthxby
Re:Gentlemen, start your spambots (Score:5, Funny)
Parent
captcha security (Score:2, Interesting)
Please take a look [primadd.net] - are the effects actually helping the recognition process?
--
social bookmarking widget for your site [primadd.net]
Re: (Score:2, Informative)
Re:captcha security (Score:4, Informative)
Parent
Re:captcha security (Score:5, Informative)
Hence all good modern captchas have moved away from character recognition captchas (such as yours) to segmentation based captchas. You only need to read the wikipedia article on CAPTCHAs to see some examples: http://en.wikipedia.org/wiki/Captcha [wikipedia.org].
Parent
That's really impressive. (Score:5, Insightful)
Lets all say it togeter. (Score:2, Insightful)
Only Yahoo? (Score:5, Informative)
Also, Yahoo captchas aren't that "hard" - they are black text from known font pools on a white background that get slightly warped and have black lines drawn on some characters. This is hardly strong since it doesn't hit all letters within the word (which is done by reCAPTCHA) or use a large font-pool variety.
Even the Slashdot Captcha is harder - it hits the whole image and uses different fonts within the word.
Re: (Score:2)
Re: (Score:2)
You'd probably want to do that anyway, since 1.15 requests per second for captchas is on par with flooding.
Malware (Score:2)
Re:Malware (Score:4, Funny)
Parent
Re: (Score:2, Funny)
Jumping without a chord would be no fun at all.
Increase In Chat Spam (Score:2)
35%??? (Score:4, Informative)
Re:35%??? (Score:4, Insightful)
It is sad because with corrective lenses, my vision is 20/20, and I'm highly technical. I should not have any problems with CAPTCHAs; However, my grandmother is another story. She has poor vision, can't figure out how to do a carriage return on her computer, has difficulty understanding the concept of scrollbars, and I'm sure would not be able to deal with even the easiest CAPTCHAs in use today. This is not usability. Granted, given the choice between SPAM or CAPTCHAs, I'll chose the lesser of the two evils...
Parent
Akismet (Score:2)
Warning on playing with the demo (Score:5, Insightful)
Gee, Ya THINK (Score:4, Insightful)
So what's the answer?
I'm sure I don't know. I do know that the wild west theory of accepting any kind of behaviour isn't acceptable. I know that some minimum standard of what's allowed and what isn't is going to have to take place. Where these limits are placed is a thing for a global conversation, and there will be differances of opinion.
Is cracking a captcha acceptalbe? Is phishing and identity theft acceptable? Is fraud and uncontrolled spam acceptable? What limits, and on what actions?
I'm just not that smart. But I think we can agree on a few things. Let's start to find out what those things are... and acting in concert with other network operators to enforce those standards. Fail to meet them, and your network routing gets dropped...
Other interesting work on CAPTCHAs (Score:3, Interesting)
You know those annoying flash advertisement games (shoot the monkey for a free iPod)? Well, they could potentially be adapted for CAPTCHAs as well: http://cups.cs.cmu.edu/soups/2006/posters/misra-poster_abstract.pdf [cmu.edu]
Re: (Score:2, Interesting)
p.s. How do you know that Gmail accounts haven't been hacked into? Do you have data validating this?
It's not a challenge to bash MS, that comes way to easy, but to add some useful content to