Spam Trap Claims 10x-100x Accuracy Gain 419
SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.
Ummmm.... (Score:3, Insightful)
Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.
Yet another wrong answer... (Score:5, Insightful)
Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.
Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
Re:Yet another wrong answer... (Score:5, Insightful)
Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
Is it a joke? (Score:3, Insightful)
Chicken-and-egg problem (Score:3, Insightful)
Re:Yet another wrong answer... (Score:3, Insightful)
Oh, and prostitution, too. And identity theft. And insurance fraud. Yup, it's simple to fix. Just make it unprofitable! Simplicity itself!
Re:Yet another wrong answer... (Score:5, Insightful)
At least once a week there seems to be another flashy technique to filter or block spam. Great.
It's not "flashy." It's called information theory and statistics. It is an extremely powerful concept that has far more important potential uses than simply filtering spam email. Every new advancement in automated classification and knowledge extraction is VITALLY IMPORTANT to our ability to cope in a world which has suddenly been flooding with SO MUCH information. This power tool is being applied to what some might see as a "silly" problem, but the fact remains that spam is a powerful motivation to researchers to push further limits in the fields of pattern recognition, information and natural language processing.
If you're against the advancement of information processing techniques, then... uh, okay, I guess. If you can't see beyond spam, you are terribly short sighted.
Re:Yet another wrong answer... (Score:1, Insightful)
I mean, I'd imagine inventing new ways of blocking spam would be a lot easier than standing the economy on its head.
Re:Chicken-and-egg problem (Score:3, Insightful)
Re:KInda flawed (Score:2, Insightful)
Not much.
Two issues: First, how does the system know that Jane's e-mail is mostly spam. Who tells it? Does it use some other filters to identify the spam in order to determine her spam rate?
Second, how does the system know that the message you received and the message Jane received are the same? Spammers have long been randomizing parts of messages in order to block older spam filters.
Re:10-100x better than what? (Score:3, Insightful)
Re:Yet another wrong answer... (Score:3, Insightful)
Wake up, they are already committing fraud, and already breaking the law. The agencies already exist that fight fraud, and yet how many spammers have actually been caught and charged with fraud? How much of this spam has actually been stopped?
Re:Yet another wrong answer... (Score:5, Insightful)
No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:
You are a spammer.
Re:Is linux for homos? (Score:2, Insightful)
Generalization of honeypots (Score:4, Insightful)
Re:Ummmm.... (Score:3, Insightful)
That's not true though. Spam is defined as bulk, unsolicited e-mail. Even if some retard actually likes to read their spam e-mails and buy things they advertise, that doesn't change the fact that the message was sent in bulk (to many other people as well), and that it was unsolicited by at least the vast, overwhelming majority of them.
Re:Yet another wrong answer... (Score:3, Insightful)
You may be a nice person and run a respectable enterprise in all other respects, but if you're sending out unsolicited emails on anything more than an individual basis, you're a spammer.
Furthermore, "This should definitely be legal, it's a great marketing tool and helps my business very well," is not a legitimate justification. It would really help my business if I could hunt down my competitors and kill them, but somehow I doubt that's going to go over very well at the inevitable murder trial. Why? Because nobody cares what's good for you or me, what matters is what's good for society as a whole. And both murder and spam are (admittedly varying degrees of) harmful.
Re:Yet another wrong answer... (Score:3, Insightful)
Way to go, Captain Obvious!
This goes down in history with other sayings of similar caliber, such as
1) "Once we can stop scams from being profitable, we will finally see them go away. But no sooner."
2) "Once we can stop prostitution from being profitable, we will finally see it go away. But no sooner."
3) "Once we can stop theft from being profitable, we will finally see it go away. But no sooner."
Somehow, despite having 4,000 years of civilization to work on these ills, the appropriate technology to eradicate these plagues has never been concocted. I'd wager that spam is not a technical problem, it's a human problem. And so long as we have A) money and B) an Internet, there will be spam.
See, there is no clear definition of spam. If I send you a direct, personal, business email that you are expecting while we're on the phone when you ask me for a quote, that's clearly not spam. And if I write a program to send out 100,000 "P3niz Pil1z" emails, that's clearly spam. But there are a MILLION shades of grey in between the two.
A) I could personalize the Peniz pil1z so that they have your name at the top.
B) I could randomize the text in the Peniz pil1z email. I could restrict the list of recipients to only those who have, at some point in the distant past, looked at a porn site.
C) I could send emails to clients of email lists in clear areas of interest to my email. EG: Send an email pronouncing my new electronic pilot gadget only to registered pilots and/or plane owners.
With each modification, we move further away from "pure" spam, towards "legitimate" commercial email.
D) I could send a quote to people who have called or contacted people in my business, even though they didn't ask for anything like my quote.
E) I could send the quote to people who have contacted my business, who didn't ask for the current quote, but have asked about something similar.
F) I could send the quote to you persuant to a conversation, even though you didn't ask for it, if/when you have asked about something similar.
G) Finally, we're over to the other extreme. You are a pilot, you want my gadget, and you are asking me for a quote, which I send you.
And there's no sharp line between the two extremes. I get emails I don't mind too much from G down to around D without personally minding too much. I get annoyed at C and anything below that is below my line. But there are plenty of people who get offended at anything below G!
It's entirely a personal, subjective decision.
Re:Yet another wrong answer... (Score:2, Insightful)
Re:Yet another wrong answer... (Score:4, Insightful)
Easy enough. Remove the customers. Set up a spam operation selling drugs. Except instead of sending what's advertised, send arsenic. Once all the customers have died, there won't be anyone left to buy spam-stuff. And, as a bonus, you help the genepool.
Re:Ummmm.... (Score:5, Insightful)
First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).
Then they avoid batching recipients based on their domain so he SMTP server can't tell who else is receiving the message.
The only way to derive the recipients now is to compare all messages against all others in order
to match them up. So they hash every message and combine those with identical hashes.
But putting a little unique text in each message during transmission foils that.
Spammers: 1 New weapons: 0
Re:KInda flawed (Score:2, Insightful)
That's the problem I have with this. Spam stopped being truly mass produced years ago. Each spam is now normally sent to each user with a different mix of nonsense. The probability of two different people receiving the same message is virtually zero.
Still doesn't make sense & GP is not a troll, (Score:3, Insightful)
That still means that the best other systems make a mistake on 1 out of every 10 messages, and the worst ones make a mistake on every single message. That's still ridiculous hyperbole.
(Personally, I'll take the system that makes 100% mistakes, and I'll use the Spam folder as my Inbox.)
Now if you said that it has 1/10 to 1/100 the error rate of normal clients (which is what they're actually claiming, I think), THAT would make mathematical sense AND be an achievement. The Slashdot title of the story is just bad no matter how you spin it.
Re:Yet another wrong answer... (Score:4, Insightful)
Where it crosses the line and becomes spam is when it's unsolicited. That's the key. Unsolicited commercial email is the very definition of spam, and no amount of hand-waving about opt-outs or the selectivity of the lists is going to change that.
Businesses that have relied on cold-calling via any medium to drum up sales have always been sleazy in my book, but when you do it via email, you're pushing the cost out onto the recipient and onto uninvolved third parties. That's at best unethical, and at worst flat-out theft.
Re:No (Score:4, Insightful)
You didn't RTFA well enough. That it's about recipients is the selling point.
That's a truth with modifications, though. Look at the quote from the web site I put in my parent post to yours, which clearly shows that it's a block based on who the sender has sent an email to. I'll repeat it, in case you missed it:
"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."
Yes, it's a recipient based system in that it assigns a score to the sender based on what the recipients of the emails are. But the blocking occurs due to the score of the sender, based on previous emails, not on the recipient of the current email.
Just think -- if it was based on blocking based on recipient only, it would either block all or no e-mail to an inbox with a single recipient. It would then only be effective for e-mails with multiple recipients, which doesn't match the claims made.
Again, think, and read the article (and that goes for the moderators too).
Re:You are also totally wrong (Score:4, Insightful)
(Ah, that explains the completely asshat moderation here, then.)
No, I didn't get it backwards -- RTFA. It's called a recipient verification system, but when you look at their own description on how it operates, you'll find that:
- It looks at the recipients of a message, and based on how much spam each of the recipient accounts gets, assigns a score to the sender.
- This score is accumulated over the last 25 emails.
(The reason for this is rather obvious, if you think about it -- if it based its score on just the last e-mail, if you sent an e-mail to someone who receives a lot of spam, it'd be automatically blocked, and that person would not get any e-mail at all.)
Say a sender sends three e-mails, to foo@foo.invalid, bar@bar.invalid, a bunch more people, and finally baz@baz.invalid. If foo@foo.invalid receives 30% spam, and the overall average is 80%, that means that the e-mail is unlikely to be spam. So a score is saved in a table for the sender. Then it goes to bar@bar.invalid, who also has a low 40% spam rate, and another "good" score is saved for sender. When the sender then after a while sends an email to baz@baz.invalid, who has a spam rate of 95%, the fact that he sent an e-mail to foo and bar earlier will increase the likelihood of his email to baz going through.
Conversely, if foo and bar received more spam than average, an e-mail sent to baz would be scored as more likely to be spam, even if baz received a record low 10% spam.
Yes, in a way, it's receiver based, because it builds the score based on the receivers' ratio of spam to valid e-mails. But the score is applied to the sender, and they state this in clear text on the web site itself. You only have to read past the sales pitch and down to the technical details.
Re:Is linux for homos? (Score:3, Insightful)
courseofhumanevents -> "Must Fence A Nervous Ho"
Re:Is linux for homos? (Score:3, Insightful)
Re:Yet another wrong answer... (Score:2, Insightful)
I agree with you on the significance of information theory. There are plenty of important applications of it, but I don't think that spam filtering is one of them. As I said before, you can filter all the email you want, and in the end you'll just find that the spammers will find a way past your filters and you'll again be bombarded with offers for penis pills.
If someone wants to use spam to train their algorithms for work in those areas, I certainly do not oppose it. But if they think that it will somehow solve the spam problem, I stand by my statement that they are dead wrong. On the other hand, if they want to apply it to something like indexing research journal articles, or some other application that is for the greater good, then I applaud their work.
Re:Cute but no cigar (Score:3, Insightful)
They're the ones creating the successful antispam systems -- you know, the ones that actually scale up on the gateway. The popular vision of bumbling PHB buffoons everywhere is just another stupid slashdot stereotype, fostered by insecure social retards who have to foist their apparent superiority over everyone by scoffing at everything. Sure, they exist, but long-term successful tech companies generally have -- get ready for it -- smart people working for them.
Anyway, the antispam companies don't have the leverage to pull off an end to spam. Symantec and Cloudmark and Ironport and so forth could stand up and scream and rant and rave at ISPs and yell about the need to secure email infrastructure, to block outbound port 25 from residential ranges, to deploy SPF, or hell just to stop bouncing (I'm looking at you Barracuda), but as long as the ISPs run their ranges as open sewers, and just slap in a few boxes to stop everyone else's spam, the spam problem will continue. And they don't like having vendors telling them how to run their business. The people with the power to stop the spam problem, who won't, are not the antispam vendors, it's the ISPs sending spam. So perhaps I was too harsh about the assessment of the PHB problem -- they certainly do seem to be the norm at ISPs (notable exceptions like AOL and parts of Roadrunner excepted).