Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Spam Trap Claims 10x-100x Accuracy Gain 419

Posted by kdawson on Monday December 03, 2007 @11:31PM from the see-it-when-i-believe-it dept.

SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.

This discussion has been archived. No new comments can be posted.

Spam Trap Claims 10x-100x Accuracy Gain

Search 419 Comments Log In/Create an Account

Comments Filter:

x100 improvement in accuracy? (Score:1, Interesting)

by EmbeddedJanitor ( 597831 ) writes: on Monday December 03, 2007 @11:38PM (#21567731)

Was the previous technology less than 1% accurate?

Share
twitter facebook
Sidestepping the arms race (Score:2, Interesting)

by whamett ( 917546 ) writes: on Tuesday December 04, 2007 @12:11AM (#21567961)

This is clever: filtering spam by exploiting properties of spam pumps in general, vs. straight content analysis. The competition of ever-more-sophisticated content scanning techniques on one side, and spammers' escalating workarounds and huge botnets on the other side, is an arms race that shows no sign of abating.

Of course, this approach does still depend on something—probably content analysis—to determine which messages are spam and which are not, so that receivers' spam statistics can be computed.
The smartest (and reportedly most effective) anti-spam technique I know is spamd [linux.com], which completely sidesteps content analysis. In a nutshell, it's an SMTP proxy that issues a temporary error code to unknown senders; legitimate MTAs retry delivery (at which point spamd lets the message through), while spam pumps don't bother. Voilà—spam gets stopped before it's ever received. A friend of mine reports that spam volume has dropped to zero since he set up spamd for his department.

If I understand the "receiver reputation" approach correctly, it could use spamd (rather than content analysis) to identify spam; similarly, content analysis can supplement spamd [benzedrine.cx]. The two are potentially complementary.

Share
twitter facebook
Re:Yet another wrong answer... (Score:3, Interesting)

by 7Prime ( 871679 ) writes: on Tuesday December 04, 2007 @12:23AM (#21568045) Homepage Journal

How about charging the sender $0.01 for every email that's never opened. That way, spammers risk a HUGE number of people catching the trap and not opening their email. It wouldn't be worth it to advertise in that fashion, because you lose more than you make (spam requires 10s of thousands of emails to be effective, if 90% of those are unopened, than you risk losing over a hundred dollars on a scheme that might make you $50 on a good day)

Parent Share
twitter facebook
Re:KInda flawed (Score:3, Interesting)

by wvmarle ( 1070040 ) writes: on Tuesday December 04, 2007 @12:23AM (#21568053)

Second, how does the system know that the message you received and the message Jane received are the same? Spammers have long been randomizing parts of messages in order to block older spam filters.

An interesting thing, as outlined in TFA that you should R, is that the mails do not have to be the same. They may have different check-sums even. However they are checked against the sending IP-address. If more messages from the same IP address arrive (presumably within a certain time frame), they are all considered spam or ham. Spammers tend to send lots of mails from the same IP address at a time, so that should work.
How they handle mailing lists though is not clear to me really. There are quite some loose ends to the article.

Parent Share
twitter facebook
Re:Yet another wrong answer... (Score:5, Interesting)

by Jimmy_B ( 129296 ) writes: <{gro.hmodnarmij} {ta} {mij}> on Tuesday December 04, 2007 @12:44AM (#21568189) Homepage

Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
Not exactly. It's making money for the spammer, but it probably isn't making money for the person who hired him. You see, even if no one ever bought anything advertised in spam, it would still be sent. The problem is multilevel marketing [wikipedia.org], which creates a lot of people desperate to sell unsellable inventory, some of whom pay spammers to advertise it for them. A perceived economic incentive is enough, even if there isn't a real one.

Parent Share
twitter facebook
Crackpot in denial. Snake oil to sell. (Score:3, Interesting)

by syousef ( 465911 ) writes: on Tuesday December 04, 2007 @12:53AM (#21568245) Journal

From TFA with commentary:
"he has started four companies, all based on his frustrations with existing products or services"

Unless they're all still in business that's probably 3 failures on record.

"Along the way he has amassed a personal fortune of about $230 million"

But he got out before the ship sank and with a bundle of cash too. I wonder what his ex-employees got...

"This is harder on my wife than it is on me," he said during a recent interview. "I just look at it as a problem. Here's a problem and you have four years to solve it or you don't get to solve any more problems."

How philosophical...So he's going to cure himself single handedly of a rare disease in 4 years, because medical research is as easy (and cheap) as writing software or tinkering with a home engineering project. I think he's been watching Crusade and sniffing glue.

"His perspective on his disease is also clear. Fourth on his list is "Why human beings will be extinct in 90 years." He writes, "My incurable blood cancer is minor compared to what is happening with the planet. We have somewhat more than 90 years before humanity is virtually extinct.""

Don't even know where to start on this one. I can't be bothered reading about his reasoning, but he's not the first person to predict the end of the world just beyond his own lifetime.

Oh and by the way he has a bridge, I mean some anti-spam software to sell you.

Gimme a break! Nothing to see here.

Share
twitter facebook
Re:Yet another wrong answer... (Score:4, Interesting)

by penix1 ( 722987 ) writes: on Tuesday December 04, 2007 @01:01AM (#21568291) Homepage

...and get very few opt-outs and many reactions.

I can imagine the reactions you get...

There are two reasons for this. First, nobody is receiving your emails because you are blocked nine ways to hell in their spam filters. Second, because most spam (yours included) use the opt-out crap for email verification of their lists. They know they have a live one so most sane people ignore opt-out links in email since they are dangerous.

what needs to be changed *IS* the opt-out crap. It needs to be confirmed-opt-in plain and simple. While they are at it, I wouldn't say no to outlawing email harvesting either. Throw in a $10,000.00 fine for each violation of either provision and call it pretty. Make half the fine go to the organization that hunts down violators and we got a sound business solution.

Parent Share
twitter facebook
Re:Kinda flawed (Score:3, Interesting)

by elronxenu ( 117773 ) writes: on Tuesday December 04, 2007 @01:24AM (#21568431) Homepage

Thanks, I was wondering why TFA said "the message does not have to have the same contents" yet it talks extensively about "the same message sent to multiple recipients".
If the contents are irrelevant, then how does this system determine that any two messages are the same? And your answer, "by the sender IP" (and unspoken, by a similar send time).
Which then leads me to ask - what about mail relays, where the same IP address sends thousands of emails every day? Wouldn't every email sent by the relay at roughly the same time be considered the same message, and (because almost everybody gets more spam than ham) be classified as spam?
I think the article tag is correct - "snakeoil".

Parent Share
twitter facebook
Re:Yet another wrong answer... (Score:2, Interesting)

by Kadin2048 ( 468275 ) * writes: <slashdot.kadinNO@SPAMxoxy.net> on Tuesday December 04, 2007 @02:04AM (#21568705) Homepage Journal

Sure it would. It would create an immediate economic incentive to not leave your machine turned on and connected to the Internet 24/7, fall behind on patches, and generally own and run a machine that you don't know how to operate, but is capable of causing harm to others.

One of the reasons the zombification problem is so bad is because many people just don't notice or care about it. If you had a meter on your wall showing your monthly Internet/e-mail bill and suddenly it started running up like a gas pump on Labor Day, you'd pull the plug pretty damn quick.

A few class-action lawsuits later, and the quality of software intended for clueless users would probably improve dramatically, too. There'd probably be a whole market for "computer operator liability insurance," that would pay for any unauthorized charges or damages provided you only used certain software and kept your systems up to date.

I think this is the inevitable direction of things in the long term. No, it probably won't happen anytime soon, because god knows it's the lawmakers who are the most clueless users of all, and they'll probably avoid making themselves responsible for as long as possible, but it's going to happen. We're becoming more and more reliant on the Internet every day, and "my PC got zombiefied" isn't going to be an excuse forever.

Frankly, given that I think liability is going to eventually come to computers just like it has any other facet of life, I think an economic cost- and damages-driven model is a better one than a top-down restrictions-driven one. (E.g., the cost-driven model says "you're responsible for what comes out of your PC onto the public network, secure it accordingly" while the restrictions model says "it's in the public interest to protect the network as a whole, therefore you can only attach systems which have passed a security inspection to it." I'd rather begin to implement the former slowly, because I think the latter is more likely to be pushed through in a knee-jerk response to some catastrophic failure at some point in the future when the politicians finally decide to have a change of heart.)

Parent Share
twitter facebook
Re:No (Score:3, Interesting)

by arth1 ( 260657 ) writes: on Tuesday December 04, 2007 @03:12AM (#21569035) Homepage Journal

No, that's not what they're saying at all. RTFA, please, cause you're describing something completely different. (And moderators too, please at least skim TFA it before moderating, because modding this "Informative" is bollocks.)

This is a system where they look at the history of who a person has sent e-mail to. If the sender has a short term history of sending e-mail to people who mostly receive spam, the e-mail is considered more likely to be spam. Conversely, if the sender has a short term history of sending email to people who don't receive much spam, the email is considered unlikely to be spam.
It's not about your inbox and its percentages, it's about the ratio of the inboxes the sender has previously sent to.

"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."

The system has one big flaw, though -- it only work with static senders. A spammer who changes the envelope from address won't get caught, and might even by luck pick a forged sender address that has a positive latest-25-score.
So the solution for the spammers to defeat this system is to send the spams multiple times to the same receipients, but with different senders. This will increase the overall spam, which I don't see as a good service.

Parent Share
twitter facebook
Re:Ummmm.... (Score:4, Interesting)

by doom ( 14564 ) writes: <doom@kzsu.stanford.edu> on Tuesday December 04, 2007 @03:16AM (#21569057) Homepage Journal

First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).

How exactly is a message supposed to get somewhere if it doesn't have the recipient info? I think you're confusing what you see in your mail box to what the mail servers see.
In any case, as is typical the news article doesn't really provide enough information to determine how the system actually works. It does sound like it's working on the premise that since spam is done in "bulk", if you see lots of identical messages going through a server you can assume that that's spam. The obvious problem would be that spammers can include randomly generated content.
But that problem is so obvious, it seems likely to me that I don't understand the system they have in mind.

Parent Share
twitter facebook
Re:Yet another wrong answer... (Score:3, Interesting)

by Colin Smith ( 2679 ) writes: on Tuesday December 04, 2007 @08:30AM (#21570377)

But legalised free for all of any drug you want is bullshit, imagine a night out when the person beside you can snort some cocain or smoke some ice?
That's the reality right now. Cocaine is only £20 per gram. I can pick up the phone and have it delivered. Like pizza, only it'll be here faster.
oh and the idea that it's ok to legalise something addictive, so you can tax people to fund their treatment is some of the worst logic i could imagine.
You legalise the addictive substance to reduce the price. Even with the tax it doesn't even come close to the previous illegal street price. This then cuts off black market. The dealers lose their revenue source and all the drug price fueled crime goes away. Basically you allow the addicts to fund their own downward spiral. Eventually the problem solves itself.

You might think that liberalisation is a soft option making life easy for addicts. It's not. It's the "we're going to make you less of a problem till you die" option.

You see I don't really care about addicts, tobacco, alcohol or other drug of choice as long as they don't bother me. It isn't the abuse of alcohol itself which I object to, it's the effect it has on my life which I object to. You want to pickle your liver and die screaming as the hallucinations kick in, go right ahead. Just do it in the privacy of your own home, funded by your own taxed drug consumption.

It isn't my responsibility to stop you from killing yourself. Your loss will only be mourned by your own friends and family, the rest of the world benefits from the reduction in resource usage your death represents.

Parent Share
twitter facebook
Re:The inventor responds... (Score:2, Interesting)

by SallyShears ( 451561 ) writes: on Tuesday December 04, 2007 @11:46AM (#21572161) Homepage Journal

I think the statistical idea here is really quite interesting. It is well known in statistics that looking at problems AND non-problems (instead of the problem cases alone), you learn more about how to discriminate problem-causing situations in the future. There is a classic case based on the data available prior to the Challenger Space Shuttle launch.

I have a couple of questions... The article and Steve's response talk about senders, messages, and recipients. If the messages from a sender have gone more to high spam recipients than to low spam recipients, then future messages from that sender are more likely to be spam. Fine so far.

A recipient is easy to identify... It's an email address.

But what is a sender? Maybe it's an IP address? Even then, is it the IP of injection? Or the IP that connected to our MX? A sender is certainly not a "From:" address since these are mostly forged and varying. The real world of spam is even more clouded... Most SPAM senders utilize multiple streams: lots of points of injection into AOL/Yahoo/GMail or lots of direct-to-MX from bots in a net. How to identify a "Sender" on whom we can measure a statistic and make a forecast for filtering? What is the "Sender" we are talking about?

And, what is a message? If it's literally one message with a long cc: list, then it's easy... When a sender sends a msg that goes more to high spam recipients than to low spam recipients, it means we should suspect that sender in future filtering. But, most spam isn't sent that way. Random variations are sent through multiple points of injection to the spectrum of recipients. Sometimes, we can make a checksum or Bayesian score that will collect the varying instances of a "message" for analysis. More often, it will look like lots of different messages, and you lose the ability to analyze across recipients.

I suspect Steve is identifying a sender as an IP connecting to our sever. Maybe a "message" is all the traffic in a short period from that IP.

I like the statistics.

I'm worried about the practical questions in our world of forged senders, forged "Received:" lines, random message variation, and botnets. What is a sender? What is a message?

Parent Share
twitter facebook
Re:The inventor responds... (Score:3, Interesting)

by flonker ( 526111 ) writes: on Tuesday December 04, 2007 @01:42PM (#21573993)

The biggest flaw I see with the system is that spammers will try to figure out "good" addresses and send more spam to those particular addresses compared to others. ie. include a web bug in the email, if the email gets through, that address is then mailbombed into oblivion increasing the rating for any of the participants of that mailbombing.

Also, eventually, the known good address may get so much spam that it becomes a "bad" address, invalidating future good emails.

Many systems to stop spam work on small and medium scale but once spammers discover the system is in use on a large scale, they start to develop active countermeasures and the system breaks.

IMO, the only way to permanently stop spam is to skip several generations ahead in terms of filtering it out, so that spam gets blocked completely for an extended period of time, and spammer R&D is halted due to lack of financial motivation. Then you have to keep ahead of future spammers, but that's a much easier task. But really, I don't see that happening.

Parent Share
twitter facebook
Re:Yet another wrong answer... (Score:2, Interesting)

by jhol13 ( 1087781 ) writes: on Tuesday December 04, 2007 @10:21PM (#21580693)

Let's see.

So the price per email must be big enough for the users to notice but not too big to kill mailing lists. Very tough, but lets assume it is doable. Though I doubt it would work (spammer would just decrease number of spams per machine per month to 1'000-10'000). There already are limits imposed by ISP's, you know ...

First I doubt the law suit would be against software makers - it would be against ISP's. That is because the bill came from the ISP, not from the OS/SW maker.

The liability - that is extremely hard question. Can I be liable if someone else is doing illegal things? It is extremely difficult for me to accept a law which would make me liable if the OS I'm using has a hole and a criminal uses it for whatever. OTOH the OS makers (and F/OSS OS's) would not accept such a responsibility.

So how would you phrase the law prohibiting unsecure PC's? Passing a "security inspection" is clearly silly - one month old inspection is almost useless. Forcing people to accept automatic updates cannot work, it has far too many problems. Prohibiting 24/7 connectivity? You'd be first to complain.

I agree on the principle, unfortunately I cannot see how it could work in practice, especially before OS's mature a bit (sandboxes, capabilities, mandatory access controls, ...). Even those do not solve the problem - there is no difference between mailing list SW and spamming SW. The difference is the contents of the emails, not the SW sending them.

Parent Share
twitter facebook
Re:The inventor responds... (Score:2, Interesting)

by propelCEO ( 661892 ) writes: on Tuesday December 04, 2007 @11:06PM (#21580967) Homepage

slashdot doesn't make it easy for me to respond to each comment. I am told to "hold on cowboy" and wait between postings. So I'll answer all the rest of the comments in this email.

a couple of people said the spammers just find the good addresses and only send to them. The problem with that is that the good addresses then turn into bad addresses and the spammer loses. Fundamentally, they cannot avoid the mathematical fact that they MUST send to people who get more spam than senders who send ham. So that might work for one spammer for a few mailings, if they could pull it off, but the victory would be very short lived. And no spammer would want to limit themselves to such a small list of recipients.

one person asked about what is a sender and what is a message. That's right. That wasn't easy to figure out. Suffice it to say that the explanations on the website give you only a basic understanding. The secret sauce is secret...until the patent issues.

another person said disclose it to prove it is spammer proof. What is the economic value in doing that? then every competitor would copy it and my company would be driven out of business since the intellectual property would be then be worthless. If you want to pay us $100M, we'll publish the algorithm. That's far less than the economic value of the invention. Any takers?

that same person said it can't possibly work in the real world. That is simply ignorant of the facts in front of you. Call the customers on our website. Some have been using it for more than a year with no algorithm updates and it is working better now than a year ago. We're about to announce a major state school system has standardized on our software for all their campuses. How could that happen if our stuff doesn't work in the real world? We sure didn't give it away. All the customers pay full price or close to that. We rarely discount. And our prices are higher than our competition.

one person asked why/how Yahoo could send to a spamtrap. I just sent to a spamtrap from my yahoo account just now. You can do it too. Spammers who get Yahoo accounts do it all the time just like I proved you can do it. And there are no stats on a "generic honeypot algorithm." each implementation is different. I don't know of any that have less than 200 errors per million messages. Do you??

Finally, the last person said my system will NEVER do as well as combining the freely available tools to fight spam. This person then didn't give numbers (like I did). And I don't think this person is telling the truth either. So I challenge that person (ls671) to prove it. I'll put up $10K and ls671 will put up $10K into an escrow account. We'll run the same realtime mailstream through both systems for 24 hours and if you get a lower total error count, you win the $10K. So ls671, this is easy money since you said we'd NEVER be able to match a system constructed of free components. If that were true, you'd accept my bet instantly because you'd always win. So this is easy money. Please accept my bet and post your acceptance here. Or post a retraction. What's your choice?

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Spam Trap Claims 10x-100x Accuracy Gain 419

Spam Trap Claims 10x-100x Accuracy Gain More Login

Spam Trap Claims 10x-100x Accuracy Gain

x100 improvement in accuracy? (Score:1, Interesting)

Sidestepping the arms race (Score:2, Interesting)

Re:Yet another wrong answer... (Score:3, Interesting)

Re:KInda flawed (Score:3, Interesting)

Re:Yet another wrong answer... (Score:5, Interesting)

Crackpot in denial. Snake oil to sell. (Score:3, Interesting)

Re:Yet another wrong answer... (Score:4, Interesting)

Re:Kinda flawed (Score:3, Interesting)

Re:Yet another wrong answer... (Score:2, Interesting)

Re:No (Score:3, Interesting)

Re:Ummmm.... (Score:4, Interesting)

Re:Yet another wrong answer... (Score:3, Interesting)

Re:The inventor responds... (Score:2, Interesting)

Re:The inventor responds... (Score:3, Interesting)

Re:Yet another wrong answer... (Score:2, Interesting)

Re:The inventor responds... (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot