I Don't Believe in Imaginary Property writes "Websense is reporting that Gmail's CAPTCHA has been broken, and that bots are beginning to sign up with a one in five success rate. More interestingly, they have a lot of technical details about how the botnet members coordinate with two different computers during the process. They believe that the second host is either trying to learn to crack the CAPTCHA or that it's a quality check of some sort. Curiously, the bots pretend to read the help information while breaking the CAPTCHA, probably to prevent Google from giving them a timeout message."
and I cannot help but wonder if this will increase our usually abysmal rate for reading handwriting. (and no, I don't design it myself so no ripping on me, just work with it)
Unfortunately, it's HumanPower(TM). About 3/4 of the way down TFA, they show a
web page with instructions (in Russian) for the people who get paid to read the CAPTCHAs.
Unfortunately, it's HumanPower(TM). About 3/4 of the way down TFA, they show a web page with instructions (in Russian) for the people who get paid to read the CAPTCHAs.
I doubt it.
TFA says this is a service SELLING captcha breaking. If it was human powered, I'd expect it to do much better than the 20% they cite.
TFA says this is a service SELLING captcha breaking. If it was human powered, I'd expect it to do much better than the 20% they cite.
Ummmm... I'm not so sure about that. OK, google's captcha's are pretty easy for humans to read, but I've often had to try literally 6 different captcha's on some sites. Yes, really.
Don't listen to the trolls, you are not alone at all.
It really depends on the captcha being used, but the real problem is that a good percentage of the time on the hard captcha's you just cannot make a definitive choice on a single letter.
That means you got a 50/50 shot of being right on it. If it was 2 letters, which is more rare, now you got a 1/4 chance of being right.
I have seen some captcha's that are so ridiculous in their attempts at obfuscating the letters, that it is just next to impossible. Maybe that is the whole point too. A strong captcha may be one that a human fails at half the time.
TFA says this is a service SELLING captcha breaking
I'm not sure you're right. Why would the page include instructions such as
In no case do not enter random characters!
We pay only correctly recognized pictures!
That sounds more like instructions for people doing the CAPTCHA breaking, no?
Unfortunately, I can only go by the English translation, somebody who can read Russian would be useful.
I'd expect it to do much better than the 20% they cite.
I can think of various reasons. For example, there might not be somebody
at the other end doing the breaking at the exact moment when the bot tries to
connect. In that case you'd get ~100% for only part of the day and 0% the rest of the time. 24 * 20% is about 5 hours each day. A part time job?
It's also true that _average_ people only break CAPTCHAs successfully about 80% of the time. Here's a relevant experiment [jgc.org]
Then there's possible issues with firewalls etc. Some bots are hosted on a zombified PC which could have any kind of restrictions, and it might have trouble dialing one of the the servers, or maybe the server can't respond properly due to inbound filtering.
Google and many other universities already have program in recruiting people to do things computers can't do well. One of those that google already uses is image tagging. Show images and ask people to write down words of what's in them. So they could simply do this with two or three images they recently obtained good label sets for. They could even throw in a fourth not-yet known labeled image and use the sign-up process to gather new image labels.
There's all sorts of hard problems like this. Another single player game is to show an image with a lot of things in it. Then give a word describing one aspect of the image and ask them to click on the part of the image that conveys that meaning.
The if you have many concurrent sign-ups there lots of two player games both symmetric and assymetric. a short chat session in the vein of the game "password" in which one person makes a series statements about an object ("it is liquid", it is white, it is tasty, you find it in the refrigerator of many homes", it comes from cows....) and the other person has to reply with "milk". Then both players are validated.
The last is a very useful AI product by the way especially if the first player is forced to use a controlled grammar where he just fills in some of the nouns or verbs but does not construct the sentence forms. This gathers a set of true assertions about an object that allow computers to learn semantics and meaning.
Turing machine? Long magnetic tape with simple instruction set and finite alphabet? Don't we essentially have those for all intents and purposes? Turing did more theoretical work with computers than just AI.
You're missing one of the greatest strengths of the invitation system: it makes trivial the task of tracking who invited whom.
If you've got a bunch of known bot accounts which have a common progenitor, you just have to take a step up the tree and look at the progenitors siblings. Are those also all bot accounts? Keep going. Any bot account or group of accounts could eventually be traced back to a single invitation.
Unless you spam the invitations to random people as well.
Then you have problems with just deleting the "root node" account and all of its children. Easier to get rid of a bunch of accounts, but still problematic.
by Anonymous Coward
on Tuesday February 26 2008, @10:26PM (#22568544)
This is a tangent, but I'm curious: this site blurs out a lot of text, presumably for privacy. How secure is that? It seems like it would be fairly easy (given knowledge of the font, which you have from other parts of the screenshot) to figure out what the underlying text is. I wish people would just black out things they don't want you to know.
Its funny actually, in the SIFT algorithm (detects scale invariant keypoints in an image, used for panorama stitching, computer vision, etc), it uses a Gaussian blur as part of the detection process. It uses multiple levels to better find invariant keypoints. While havening the unblurred image certainly helps, its not necessary.
Except truly intelligent bots would realize that reading the help makes them easily distinguishable from humans. Bots that wanted to look human should also have the REFERER field show them as coming from a pr0n or blog site.
Instead, Google should use something akin MENSA tests. This would deter the bots and make the customers feel really good about themselves. And this feeling, my friend, can't be bought cheaply.
That raises an interesting idea... why not use the capchas to perform some useful work? Example... display a scanned line of text from a project that needs a large volume of text OCR'd for free/cheap. Compare the texts from several submitters, and assume groups with a high match rate are reading it correctly.
This accomplishes three goals: - fairly effective capchas - accomplishes something - causes OCR quality to improve (via the hard work of the botnet coders)
Not saying the above example is ideal, just trying to illustrate the idea. Take advantage of available resources (be they real people or botnets) and harvest it to accomplish something practical with it.
One word that is shown to you is always known. The second one is unknown. In your case, you entered the known word correctly.
As anti-bot measure, reCAPTCHA starts showing pictures with BOTH known words if you (anyone with your IP) incorrectly guess two words in one hour, AFAIR.
This makes one wonder: Is it possible that it is cost effective for spammers to employ low-cost human labor and that they pipe all these captcha challenges to this set of humans whose sole job is to stare at computer screens with pending captcha challenges and answer them?
(I would imagine that this job would have high turnover:) )
one technique that has been used in the past, is that porn websites will have their registration page just be a proxy for a registration page on a site they want to spam. people register and they get their captchas done for free.
Well, it wasn't on a porn site, but I've done proxying of captchas (Proof of Concept) for:
PayPal GMail eBay
It's not hard - use CURL, have it handle cookies. Populate database, give to users (requires decent traffic). My system even used a regex on the registration success page to fail users who failed the captcha.
Given my system took about half an hour to write, and people are going to lengths like the ones in the article to beat them, it's pretty much a given that people are out there doing it now. FWIW, I was working on ways to watermark a captcha to make the source obvious.
Seriuosly! It is high time they moved to something that was difficult to break. IIRC there was an image comparison technique where you are supposed to match two images of similar objects or animals. I think here if the environment, color, zoom and other factors are different then there is no way this can be broken. Although you cannot generate such images, if you have a photo gallery of 10k pics and continuosly growing I think that should be good enough till we have humanoid robots that can look at the pictures and correctly match them.
The idea is to present a 3x3 grid of images and have the user select the 3 kittens from the 9 fuzzy animals. That's something computers are still quite bad at... Though you probably need to change the probability of getting it by random luck to be worse than 1/84, in practice.
KittenAuth always makes me think of the Futurama episode where the crew had to deliver a package to the uninhabited planet full of robots (sure it's inhabited, like a warehouse is inhabited by boxes).
To prevent capture they dressed as robots, and were stopped at the city gates by two gate robots who administered a PuppyAuth-based anti-Turing test:
Robot Guard #1: Be you robot or human? Leela: Robot, we be. Fry: Yep, just two robots out roboting it up. Robot Guard #2: Administer the test. Robot Guard #1: Which of these would you prefer? A. a puppy; B. a flower from your sweetie; or C. a properly formatted data file? Choose! Fry: Is the puppy mechanical in any way? Robot Guard #1: No. It is the bad kind of puppy. Leela: Then we'll go with that data file. Robot Guard #1: Correct. The flower would have also been acceptable. Robot Guard #2: You may pass.
I've got the perfect answer. How about a PORNTCHA? Use hi-res porn images as the CAPTCHA images, and use hard-to-automate anatomical questions like "are the blonde's boobs bigger than the brunette's?" or "Are these two lesbians?" Any wrong answer brings up another PORNTCHA challenge. Any correct answer ends the porn session and proceeds to the signup. The porn users probably won't "feel the need" to answer a lot of questions correctly, and the service users have a way to get past.
It's kinda like a honey pot, only with tasty, tasty honeys.
the CAPTCHA hasn't been "cracked". These people are just using humans to enter the CAPTCHA text; which is the whole point of the CAPTCHA anyways!
Remember: CAPTCHA is an acronym (or backronym, depending on who you believe) for "Completely Automated Public Turing test to tell Computers and Humans Apart".
The CAPTCHA would be considered cracked if there was a computer algorithm somewhere decoding it autonomously.
They are an awful abomination on all website usability and is becoming increasingly common they just don't do what they are supposed to do any more.
So it seems that these companies have two options, either make the letters and numbers more unreadable and more frustrating to users, or scrap them completely and come up with a new anti-bot scheme.
My favorite so far is KittenAuth (http://www.thepcspy.com/kittenauth). It's easy to use, and would be a hell of a lot harder to crack then letters and numbers. Most importantly it's cute! So adorable
If the bots are stalling for time, it's quite likely someone's home-grown version of Mechanical Turk distributed "human" task service, similar to the one by Amazon.
The image is put on queue and, say, a good number of, say, overseas employees... are getting the image and need to fill back in the solution as plain text. In the mean time the bot is "reading the manual".
When the bot gets the answer in time, it submits the form and there we go, account.
So if someone has broken the captcha, spam bots can send spam from the fake google accounts. Google can rate-limit outgoing email. Also they can watch accounts that send identical or similar emails. They already do profiling of accounts for adsense. By profiling accounts to filter spam, they can warn and then close down spammy accounts or simply close down the ones that look very spammy. Additionally, they can filter IPs and use cookies to identify infected spamnet computers.
If the web browser guys could agree on a standard to inform people that their computers look like they're infected, the major email and associated portal providers could start inserting signed messages in web pages that will inform the users that their computers are infected based on this kind of information.
I wonder if it's worth it to Microsoft and Google and Yahoo and AOL to team up to fight these increasingly powerful and sophisticated bot nets.
Not all Admins are you. Some of us actually know how to keep a Windows machine secure. Ignorance of the facts isn't an excuse.
Yet it is the case that sufficiently large numbers of Windows users are unable to keep their machines secure for a botnet to accomplish this task. The fact that Windows can be made secure does not even remotely mean that this will be done in practice.
Any machine Linux or Windows will be exploited and gang raped if it's not regularly updated and kept clean with the permissions system.
I would like to hear how this is actually being done in the wild on Linux/*BSD/MacOS/etc. The fact is that it isn't.
I would like to hear how this is actually being done in the wild on Linux/*BSD/MacOS/etc
A botnet developer who hopes to mass a significantly sized network would have no interest in the sub 5% of desktop(read poorly managed, no matter the OS) computers that your niche market segment occupies.
For syn floods, what do you think would be more effective.. a windows desktop machine on a comcast line, or a collocated linux server?
Lurk around undernet for a while. A large majority of botnet sales that I have seen have been comprised mostly of cracked linux webservers. Why write a worm to harvest windows machines when you can google for as much power as you need?
Why are there so many people compromising web hosting accounts and servers where the admin is running some dinky hosting control panel that allows them to know nothing about the operating system? I think you'll find that all modern operating systems are just as insecure as each other in that the things permitted of a program are far in excess of what is required by the program for its operation. Why does notepad need access to the internet, why does a php application need to be able to run arbitrary commands, etc.
> A linux desktop O/S is just as insecure technically. Secure from what? Internal or external threats? In the internal case it exhibits better protection from escalation of privilege (than windows, see Sony rootkit for an example). In the external case is affords simpler accounting of the processes laying around.
>The linux (and Apple) desktops are just more secure by the same reason a hut in a small remote village is more secure than an apartment in a big city ghetto - a one room apartment with many locks, metal doors and chains, but where the occupants let in muggers just because they said they were from Ebay.
No, it is more secure for a some applications because less of the network facing executable code needs to run at as high a privilege level.
>They're both not secure. That depends entirely on the threat model you are protecting against. If you want it really secure from the network, take it off the network. If you want it secure from users put it in a locked room and have multi person, multi factor authentication to access it and require dual operator controls so no individual can pull something off unobserved. This is how PKI centers work. If you want a secure online server, you need accounting of the trusted code. The extend to which Windows and Linux compare is quite different for those cases.
>The trick is to NOT have a _one_room_ apartment or hut. You need an "airlock" (sandbox) for your browser (not just rooms for each person).
Or you might document and analyze your threat model first, before protecting against those threats.
i work with OCR/ICR technology (Score:5, Interesting)
Re:i work with OCR/ICR technology (Score:5, Informative)
Parent
Re:i work with OCR/ICR technology (Score:5, Insightful)
I doubt it.
TFA says this is a service SELLING captcha breaking. If it was human powered, I'd expect it to do much better than the 20% they cite.
Parent
Re:i work with OCR/ICR technology (Score:5, Insightful)
Ummmm... I'm not so sure about that. OK, google's captcha's are pretty easy for humans to read, but I've often had to try literally 6 different captcha's on some sites. Yes, really.
Parent
Re:i work with OCR/ICR technology (Score:5, Informative)
It really depends on the captcha being used, but the real problem is that a good percentage of the time on the hard captcha's you just cannot make a definitive choice on a single letter.
That means you got a 50/50 shot of being right on it. If it was 2 letters, which is more rare, now you got a 1/4 chance of being right.
I have seen some captcha's that are so ridiculous in their attempts at obfuscating the letters, that it is just next to impossible. Maybe that is the whole point too. A strong captcha may be one that a human fails at half the time.
Parent
Re:i work with OCR/ICR technology (Score:5, Informative)
It's also true that _average_ people only break CAPTCHAs successfully about 80% of the time. Here's a relevant experiment [jgc.org]
Then there's possible issues with firewalls etc. Some bots are hosted on a zombified PC which could have any kind of restrictions, and it might have trouble dialing one of the the servers, or maybe the server can't respond properly due to inbound filtering.
Parent
Re:i work with OCR/ICR technology (Score:5, Funny)
Parent
Re:i work with OCR/ICR technology (Score:5, Funny)
Parent
Re:i work with OCR/ICR technology (Score:5, Funny)
Parent
Easy solution already known (Score:5, Interesting)
There's all sorts of hard problems like this. Another single player game is to show an image with a lot of things in it. Then give a word describing one aspect of the image and ask them to click on the part of the image that conveys that meaning.
The if you have many concurrent sign-ups there lots of two player games both symmetric and assymetric. a short chat session in the vein of the game "password" in which one person makes a series statements about an object ("it is liquid", it is white, it is tasty, you find it in the refrigerator of many homes", it comes from cows....) and the other person has to reply with "milk". Then both players are validated.
The last is a very useful AI product by the way especially if the first player is forced to use a controlled grammar where he just fills in some of the nouns or verbs but does not construct the sentence forms. This gathers a set of true assertions about an object that allow computers to learn semantics and meaning.
Parent
I liked the invitations only system better (Score:5, Insightful)
One step closer... (Score:5, Funny)
I'm tired of my imaginary friends running off and leaving me alone... I want one with configuration options.
Parent
Re:One step closer... (Score:5, Informative)
Parent
Re:One step closer... (Score:5, Funny)
Parent
Bots COULD invite themselves, that's not the point (Score:5, Insightful)
If you've got a bunch of known bot accounts which have a common progenitor, you just have to take a step up the tree and look at the progenitors siblings. Are those also all bot accounts? Keep going. Any bot account or group of accounts could eventually be traced back to a single invitation.
It would help for rooting out bot accounts.
Parent
Re:Bots COULD invite themselves, that's not the po (Score:5, Insightful)
Then you have problems with just deleting the "root node" account and all of its children. Easier to get rid of a bunch of accounts, but still problematic.
Parent
Blurred text == secure?? (Score:4, Interesting)
Re:Blurred text == secure?? (Score:5, Interesting)
Parent
Bots RTFM! (Score:5, Funny)
Re:Bots RTFM! (Score:5, Funny)
Parent
Re:Bots RTFM! (Score:5, Funny)
Parent
Re:Bots RTFM! (Score:5, Insightful)
Parent
CAPTCHA is for weak minds (Score:5, Funny)
Until one day... (Score:4, Funny)
Cue overlords posts in 3...2...1...
Parent
Re:CAPTCHA is for weak minds (Score:5, Interesting)
This accomplishes three goals:
- fairly effective capchas
- accomplishes something
- causes OCR quality to improve (via the hard work of the botnet coders)
Not saying the above example is ideal, just trying to illustrate the idea. Take advantage of available resources (be they real people or botnets) and harvest it to accomplish something practical with it.
Parent
Re:CAPTCHA is for weak minds (Score:5, Informative)
Parent
Re:CAPTCHA is for weak minds (Score:4, Informative)
Parent
Re:CAPTCHA is for weak minds (Score:5, Informative)
As anti-bot measure, reCAPTCHA starts showing pictures with BOTH known words if you (anyone with your IP) incorrectly guess two words in one hour, AFAIR.
Parent
Humans? (Score:5, Interesting)
(I would imagine that this job would have high turnover
Re:Humans? (Score:5, Interesting)
Parent
Re:Humans? (Score:5, Interesting)
PayPal
GMail
eBay
It's not hard - use CURL, have it handle cookies. Populate database, give to users (requires decent traffic). My system even used a regex on the registration success page to fail users who failed the captcha.
Given my system took about half an hour to write, and people are going to lengths like the ones in the article to beat them, it's pretty much a given that people are out there doing it now. FWIW, I was working on ways to watermark a captcha to make the source obvious.
Parent
Re:Quite likely (Score:4, Insightful)
Parent
Well... (Score:5, Funny)
Stop using CAPTCHA! (Score:5, Insightful)
Re:Stop using CAPTCHA! (Score:5, Insightful)
Just use kittens [arstechnica.com] instead...
The idea is to present a 3x3 grid of images and have the user select the 3 kittens from the 9 fuzzy animals. That's something computers are still quite bad at... Though you probably need to change the probability of getting it by random luck to be worse than 1/84, in practice.
Parent
Futurama to the rescue! (Score:5, Funny)
To prevent capture they dressed as robots, and were stopped at the city gates by two gate robots who administered a PuppyAuth-based anti-Turing test:
Parent
Re:Stop using CAPTCHA! (Score:5, Funny)
It's kinda like a honey pot, only with tasty, tasty honeys.
Parent
Re:Stop using CAPTCHA! (Score:5, Funny)
No, it's sad that a bunch of anime nerds think their captcha system guards a forum that any spammers would find worth caring about.
Parent
To be fair.. (Score:5, Informative)
Remember: CAPTCHA is an acronym (or backronym, depending on who you believe) for "Completely Automated Public Turing test to tell Computers and Humans Apart".
The CAPTCHA would be considered cracked if there was a computer algorithm somewhere decoding it autonomously.
CAPTCHAs should die (Score:5, Interesting)
So it seems that these companies have two options, either make the letters and numbers more unreadable and more frustrating to users, or scrap them completely and come up with a new anti-bot scheme.
My favorite so far is KittenAuth (http://www.thepcspy.com/kittenauth). It's easy to use, and would be a hell of a lot harder to crack then letters and numbers. Most importantly it's cute! So adorable
Re:CAPTCHAs should die (Score:4, Funny)
Wow.
-Peter
Parent
Mechanical Turk (Score:5, Interesting)
The image is put on queue and, say, a good number of, say, overseas employees... are getting the image and need to fill back in the solution as plain text. In the mean time the bot is "reading the manual".
When the bot gets the answer in time, it submits the form and there we go, account.
spam filtering (Score:5, Interesting)
If the web browser guys could agree on a standard to inform people that their computers look like they're infected, the major email and associated portal providers could start inserting signed messages in web pages that will inform the users that their computers are infected based on this kind of information.
I wonder if it's worth it to Microsoft and Google and Yahoo and AOL to team up to fight these increasingly powerful and sophisticated bot nets.
http://xkcd.com/233/ (Score:4, Informative)
Damn! 1 in 5!? (Score:4, Funny)
That's better than I can do reading those damn things!!!
Re:Get off the security high horse. (Score:5, Insightful)
Parent
Re:Get off the security high horse. (Score:5, Insightful)
Parent
Re:Get off the security high horse. (Score:5, Insightful)
Lurk around undernet for a while. A large majority of botnet sales that I have seen have been comprised mostly of cracked linux webservers. Why write a worm to harvest windows machines when you can google for as much power as you need?
Parent
Re:Get off the security high horse. (Score:4, Insightful)
Parent
Re:Time to ban Microsoft products (Score:5, Interesting)
Secure from what? Internal or external threats? In the internal case it exhibits better protection from escalation of privilege (than windows, see Sony rootkit for an example). In the external case is affords simpler accounting of the processes laying around.
>The linux (and Apple) desktops are just more secure by the same reason a hut in a small remote village is more secure than an apartment in a big city ghetto - a one room apartment with many locks, metal doors and chains, but where the occupants let in muggers just because they said they were from Ebay.
No, it is more secure for a some applications because less of the network facing executable code needs to run at as high a privilege level.
>They're both not secure.
That depends entirely on the threat model you are protecting against. If you want it really secure from the network, take it off the network. If you want it secure from users put it in a locked room and have multi person, multi factor authentication to access it and require dual operator controls so no individual can pull something off unobserved. This is how PKI centers work. If you want a secure online server, you need accounting of the trusted code. The extend to which Windows and Linux compare is quite different for those cases.
>The trick is to NOT have a _one_room_ apartment or hut. You need an "airlock" (sandbox) for your browser (not just rooms for each person).
Or you might document and analyze your threat model first, before protecting against those threats.
Parent