Machine Learning Susses Out Social-Network Fraud 42
CowboyRobot writes "Machine learning techniques can be used to detect fraud and spies on social networks based on certain features, such as the number of followers and the number of devices used to access the network. Certain characteristics of social-network accounts have a high correlation with fraud and can be used to differentiate between real and fake accounts, a researcher presenting at the SOURCE Boston Conference said this week. Using machine learning techniques, Vicente Diaz, a senior security analyst with security software firm Kaspersky Lab, found that seven characteristics of Twitter profiles could identify fraudulent accounts 91% of the time. The number of devices from which a user accesses the service, the ratio of followers to people following an account, the average number of tweets to each person, and the number of tweets to an unknown receiver are all features that correlate strongly to fraudulent accounts, he says."
Case in point (Score:3)
My guess is this is annoying for facebook and advertising firms who are paying money for sanctioned spamming, and they want to make sure they're not adve
Re: (Score:3)
"Who the hell is adding random people as friends they've never heard of before, then can't tell spam from actual communication?"
He's called Everybody, I think.
Re:Case in point (Score:4, Insightful)
It's not necessarily friends directly posting crap on your page. A lot of fraud/spam on Facebook comes from these pages set up specifically to attract followers so the page can be sold for huge advertising bucks. They'll post exploitative pictures of injured animals, maimed soldiers, etc. with captions like "1 SHARE = 1 RESPECT". No matter how often I've warned my friends against forwarding this stuff, they'll do exactly what they are told because they don't want to be accused of not caring about puppies or war heroes or orphans or Jesus or whatever.
The end result is, no matter how hard I try to avoid it and how careful I am to restrict my account only to friends and colleagues I personally know, I still get spam from these phony accounts plastered all over my news feed.
Re: (Score:2)
No matter how often I've warned my friends against forwarding this stuff, they'll do exactly what they are told because they don't want to be accused of not caring about puppies or war heroes or orphans or Jesus or whatever.
The end result is, no matter how hard I try to avoid it and how careful I am to restrict my account only to friends and colleagues I personally know, I still get spam from these phony accounts plastered all over my news feed.
Well that explains why I get so little SPAM. My friends aren't any dumber, but they don't care about puppies or war heroes or orphans or Jesus or whatever.
Puppies are food. War Heroes are instruments of death. Orphanages are rackets (seriously, try adopting a child some time), and Jesus stole my hubcaps.
Re: (Score:2)
Who the hell is adding random people as friends they've never heard of before, then can't tell spam from actual communication?
Remember Mafia Wars? Or Farmville? Where the number of friends you had was directly linked to how "powerful" you were in the game? It would be those people.
faggot bastard strikes again (Score:3)
Fix this crap, slashdot!
The spammers just get better (Score:2)
This is pretty much useless. If people start using software filters to detect social-network frauds and spammers, the frauds and spammers will simply reverse-engineer the filter algorithm and adjust their "number of devices from which a user accesses the service, the ratio of followers to people following an account, the average number of tweets to each person, and the number of tweets to an unknown receiver" to whatever values don't trigger the fraud filter.
The spammers evolve just as fast as the filters.
Re: (Score:2)
This is pretty much useless.
True. It detects incompetent spammers. Remember when warnings about spam and phishing included the suggestion to look for bad spelling? Remember when warnings about mail bombs included looking for excessive wrapping tape? It's like that.
What you can do with a 91% successful classifier is ignore the item for search purposes.
In related news (Score:5, Funny)
Re:In related news (Score:4, Funny)
Re: (Score:1)
So I would be a fraud... (Score:2)
In conclusion, as I do not access facebook even from my watch, do not comment on every single thing I do in my day and not have "thousands of followers", so I can only be a fraud
Re:So I would be a fraud... (Score:4, Informative)
"So I would be a fraud if I had a facebook account."
Precisely. There are several things wrong with trying to actually use this in the real world.
(1) 91% is not nearly good enough. Period.
(2) Even if it were 99.9% accurate, it would still not be good enough. Because it runs into the base rate fallacy [wikipedia.org].
(3) Similar but not related to the base rate fallacy, is that a statistical correlation between datasets of millions says nothing about an individual account.
False positive and false negative rates? (Score:3)
found that seven characteristics of Twitter profiles could identify fraudulent accounts 91% of the time.
Taking the 91% number as accurate for argument's sake, what are the false positive and false negative rates? Even a 1% false positive or false negative rate would be quite a lot of accounts when you consider how many millions of twitter accounts there are out there.
Comment removed (Score:3)
Re: (Score:2)
Yep, you're not in marketing clearly.
I think you'll find that for businesses that rely on strong ties to their customers. For many businesses one off sales don't cut it, particularly small businesses and so social networking is an essential tool. It may shock you to hear that social networking is merely the new phrase for "word of mouth" with some extra bells and whistles to help along repeat business (the whole "following" mechanic).
Not far from where I live is a pie shop called "Piefection" - I thought i
Number of devices? (Score:2)
The number of devices from which a user accesses the service.
So does Twitter just publicly disclose a simple device count or the detailed information on all devices? If the latter, isn't that a whopping security hole to be exploited by people looking for targets with known vulnerable devices.