Plan for Spam, Version 2 464
bugbear writes "I just posted a new version of the Plan for
Spam Bayesian filtering algorithm. The big change is to mark tokens by context. The new version decreases spams missed by 50%, to 2.5 per 1000, even though spam has gotten harder to filter since the summer. I also talk about how spam will evolve, and what to do about it."
Re:How is spam that big of a problem? (Score:3, Insightful)
Spam only cost-ineffective with ISP-level filters (Score:5, Insightful)
To make spam cost-ineffective for the spammers, we've got to stop it (or flag it) before it gets to the end-user. It would obviously be a mistake to allow ISP's to automatically delete all email that fails their spam filters, but I think it would be appropriate for them to include something in the headers flagging such email as probable spam. Then future email readers could detect this header and handle it gracefully, like moving it to a "spam" folder on the user's machine. Once this happens and Grandpa no longer gets email asking him to test the latest Viagra alternative, spam may become a thing of the past.
filtering effectiveness (Score:5, Insightful)
Re:hopeless (Score:5, Insightful)
Travis
Re:hopeless (Score:5, Insightful)
Over at SpamAssassin, they've been busily creating a system that collects "good enough" tests by the dozens and uses them to collectively score a message and determine its general "spamishness". The system relies on a complex scoring system that is determined, not by the whim of human programmers, but on the results of a genetic training system that pits one set of scores against another until equilibrium is reached for a given set of example spam and non-spam.
See my other post here for how Bayesian filtering will be used to allow this system to feed back on itself and improve as it sees more of your spam and non-spam....
Because it's free. (Score:2, Insightful)
Standard Spam API (Score:2, Insightful)
Re:Why can't we have legal restrictions on spam? (Score:2, Insightful)
Re:Performance (Score:2, Insightful)
Personally I think the author of the paper is a bit idealistic in ways when they say "If we can write software that recognizes their messages, there is no way they can get around that". Well then again maybe they aren't: Saying "if we can...recognize their messages" is a pretty wide net presumption, and of course the following conclusion follows, however the real question is "can we realistically make software that can effectively identify with zero incidences of false positives". For people who email between themselves and one or two other people on one subject that isn't a problem, but I suspect that statistical word usage analysis wouldn't be quite as successful for someone with a more disparate mail usage.
Re:Stop spam? (Score:2, Insightful)
Re:Why can't we have legal restrictions on spam? (Score:5, Insightful)
In certain ways, the government does and should do precisely that. If I repeatedly call you at 4 AM to ask if your refrigerator is running or deliberately send you virus-laden e-mail, then you have every right to call upon the long arm of the law to slap down the harassment.
Spamming, being a violation of the recipient's property rights, falls into that category.
Another way to filter out SPAMs (Score:3, Insightful)
My idea is this: The system maintains an initially empty whitelist. When mail is received from a sender not on the whitelist, autoreply with a message explaining the situation and requesting an email back whose first line or subject contains a random word or phrase from the dictionary. Human beings will grumble, respond, and get added to the whitelist. Spammers won't give your email the personal attention it needs to get past, so you remain blissfully unaware of it.
Re:AOL or Hotmail adopt? (Score:5, Insightful)
No. My problem's with the senders, not the messages. What Hotmail should do is send back an email saying "Your message has been rejected because you have not been authorized by this user. If you'd like to request authorization, click here and follow the instructions."
When they properly fill out the form, you get a message saying "so'n'so wants to send you a message. Interested?" and you can say yes/no. If you say yes, they get added to your address book and they can email you until you remove them from it.
With this approach, it requires a valid return address before the message can possibly get to you. That means you're able to tell the person to remove you, unlike today's 'send anything to anybody' system.
If Hotmail did that, I'd actually consider paying for their service.
Re:Another way to filter out SPAMs (Score:3, Insightful)
Some legit email is definitely computer generated. I sign up for
If you standardize an autoreply, so that websites could parse and return it, then so could the spammers, easily enough.
Finally, you'd be doubling the amount of bandwidth spent on email, as each spam would now have a corresponding auto reply.
Re:better than legislation (Score:3, Insightful)
I hear this argument and variations on it from time to time, but the more I consider it the more flawed it looks to me. There are really two kinds of filters to consider:
These two things are not at all equivalent to the spammer because of the psychology of spam. Fundamentally, email readers are likely to fall into two fairly tight categories: suckers who will listen to spam and non-suckers who won't. Anyone who applies his own personal email filter is likely to fall into the non-sucker category, so there's little point in designing a message specifically to bypass those personal filters. The target won't buy your product even if you do get it past his filter. That's not the case with ISP level filters, though, which protect suckers and non-suckers alike. Those are worth bypassing because they're stopping some email that would get to the suckers who would buy your product.
Now it may be the case that the same techniques that are useful for avoiding ISP-level filters will also help get mail past personal filters. That even seems likely, given that many people use ISP-type filters for their personal mail because the ISPs don't do it for them. But it seems to me that there's little percentage in specifically trying to avoid personal level filters that work on a different system from the ISP-level filters because the simple fact that somebody is bothering to use the filter implies that he won't buy from the spammer anyway.
Re:Why can't we have legal restrictions on spam? (Score:1, Insightful)
Because it would have no effect on spam. Making spam illegal in the United States would simply move the points of origin offshore.
Why would that be an improvement? No dedicated spammer would hesitate for more than a nanosecond before getting an account at a host in Panama, or wherever they would be safe from local prosecution.
The only workable solution to spam is to learn to use local filtering. Someday, I hope, we're going to learn that passing laws against stuff that annoys us only leads to unpleasant unforeseen consequences.
Re:Why can't we have legal restrictions on spam? (Score:2, Insightful)
Traditionally and in general, anything on the 'net that can be achived through both technical means and legal recourse is almost always implemented via the technical route.
The reasons for this are many; the major reasons are simple. While most people on the 'net have not been lawyers, most of the first people - esp. USENET users - were engineers and scientists. Such people develop a distain for legal recourse after spending (wasting) so much of their time in political and legal battles in the *real world* justifying and defending their work and themselves. Just ask any graduate student standing in line at his college Bursar's office how he feels about contracts and (non-technical) paperwork.
Furethermore, by avoiding the often easy to circumvent and hard to quantify political avenue, the solutions are usually more effecitve in both the short an long term. Many solutions, such as the Baysian SPAM filtering disscussed here, also give these technical people a chance to prove their worth or gain some small measure of fame by association with a good solution.
Remember: Conventional Wisdom is an oxymoron. There are always reasons for something, even if theose reasons are nothing but hubris and desire. It is up to you to accept or change them.
Re:Stop spam? (Score:5, Insightful)
Actually spamassassin has a nice built-in reporting tool And if you setup it up to work with with Vipul's Razor [sourceforge.net] for it's all automagically updated.
Re:Stop spam? (Score:5, Insightful)
You say that it will stop if it's fully against the law and people bring legal action to stop it.
Last time I checked, murder was illegal, punishable by death in many states, yet it still occurs.
Re:The more serious problem (Score:3, Insightful)
Then the filter will adapt to the types of legitimate messages you receive, that's the entire point.
Re:Stop spam? (Score:4, Insightful)
People spam because it is rational to do so (or at least spammers make them think so). Very low costs, the possibility of a good return, and nothing to lose since there are virtually no spam laws.
A better comparison than murder is the practice of child labor. While it was legal it was a rational practice to engage in, because the return was high and the risk was low -- if a kid gets eaten by a machine you just find another kid. Now that is illegal the practice is almost completely extinct because it is no longer rational -- the police would come knocking at the door, which impedes the goal of running a profitable business.