Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Spam

Plan for Spam, Version 2 464

bugbear writes "I just posted a new version of the Plan for Spam Bayesian filtering algorithm. The big change is to mark tokens by context. The new version decreases spams missed by 50%, to 2.5 per 1000, even though spam has gotten harder to filter since the summer. I also talk about how spam will evolve, and what to do about it."
This discussion has been archived. No new comments can be posted.

Plan for Spam, Version 2

Comments Filter:
  • by crawdaddy ( 344241 ) on Tuesday January 21, 2003 @02:36PM (#5128212)
    Overblown? The fact that you would need more than one email account to keep from having your time wasted by spam proves otherwise.
  • by PseudoThink ( 576121 ) on Tuesday January 21, 2003 @02:39PM (#5128235)
    Spam filters are great, but it seems that only the Net-savvy are using them. Savvy users aren't the people spammers are making all their money from--they are making money off the naive and inexperienced users. These users aren't going to go out and install the latest Bayesian filters on their system, and the major email readers won't (and probably shouldn't) come with them automatically activated.

    To make spam cost-ineffective for the spammers, we've got to stop it (or flag it) before it gets to the end-user. It would obviously be a mistake to allow ISP's to automatically delete all email that fails their spam filters, but I think it would be appropriate for them to include something in the headers flagging such email as probable spam. Then future email readers could detect this header and handle it gracefully, like moving it to a "spam" folder on the user's machine. Once this happens and Grandpa no longer gets email asking him to test the latest Viagra alternative, spam may become a thing of the past.
  • by qoncept ( 599709 ) on Tuesday January 21, 2003 @02:40PM (#5128239) Homepage
    I think I speak for everyone when I say false positives are the only real hinderance to the filtering of spam. I get roughly 20 emails a day, 75% of which are spam. If one of them slips past the filter and I see it, it doesn't bother me so much. Spam is no longer a problem. What is an absolute necessity, though, (and probably less so for me than other people) is that none of my legitimate email is filtered as spam. I'd rather have 100 spams filtered improperly than one legit email.
  • Re:hopeless (Score:5, Insightful)

    by Kallahar ( 227430 ) <kallahar@quickwired.com> on Tuesday January 21, 2003 @02:40PM (#5128242) Homepage
    Yeah, 2.5 per 1000 getting through is a proof that his ideas are obviously flawed. Having a working system is the best proof that an idea works :)

    Travis
  • Re:hopeless (Score:5, Insightful)

    by ajs ( 35943 ) <{ajs} {at} {ajs.com}> on Tuesday January 21, 2003 @02:42PM (#5128263) Homepage Journal
    Everyone but the folks at SpamAssassin have been focusing on the idea that any one technique for identifying spam is doomed to diminishing returns.

    Over at SpamAssassin, they've been busily creating a system that collects "good enough" tests by the dozens and uses them to collectively score a message and determine its general "spamishness". The system relies on a complex scoring system that is determined, not by the whim of human programmers, but on the results of a genetic training system that pits one set of scores against another until equilibrium is reached for a given set of example spam and non-spam.

    See my other post here for how Bayesian filtering will be used to allow this system to feed back on itself and improve as it sees more of your spam and non-spam....
  • Because it's free. (Score:2, Insightful)

    by Presto_slashdot ( 573879 ) on Tuesday January 21, 2003 @02:45PM (#5128288)
    You probably get no spam to your home or cell phone because it's too expensive to set up a company in China and make phone calls to the US, just to get around the laws. Unfortunately, it *is* basically free to send spam mail. If they could call you for free from outside the US, they would be doing that too.
  • Standard Spam API (Score:2, Insightful)

    by Anonymous Coward on Tuesday January 21, 2003 @02:45PM (#5128290)
    I have been quite excited with all the new ideas being put to use in fighting spam recently. Unfortunately, whenever I find one that is implemented, it doesn't work with my mail server or my client. It seems like there should be a standard API that spam filters could implement, (using soap or xml-rpc or something), so that the various mail servers and email clients could use a single plug-in to add spam filtering. This would allow the people who are good at spam filter code to focous on that one problem, and the people who are good at writing email plugins and GUI code can do what they are good at.
  • by cheezedawg ( 413482 ) on Tuesday January 21, 2003 @02:48PM (#5128312) Journal
    Because the last thing we need in this country is the government telling us how and when we can send email or make a phone call.
  • Re:Performance (Score:2, Insightful)

    by ergo98 ( 9391 ) on Tuesday January 21, 2003 @02:51PM (#5128330) Homepage Journal
    And this is the clincher in any of these spam filters: If the filter automatically deletes messages that it identifies as spam (which could be legitimate business proposal or job offer, for example) then a false positive would be incredibly destructive. If it doesn't automatically delete but instead you periodically go through all of the messages, then it's of little value as you're forced to manually filter the spams anyways. The irony is that the better it is at identifying spams, the more destructive a false positive would be as you casually scan through and delete large clusters of supposed spam.

    Personally I think the author of the paper is a bit idealistic in ways when they say "If we can write software that recognizes their messages, there is no way they can get around that". Well then again maybe they aren't: Saying "if we can...recognize their messages" is a pretty wide net presumption, and of course the following conclusion follows, however the real question is "can we realistically make software that can effectively identify with zero incidences of false positives". For people who email between themselves and one or two other people on one subject that isn't a problem, but I suspect that statistical word usage analysis wouldn't be quite as successful for someone with a more disparate mail usage.
  • Re:Stop spam? (Score:2, Insightful)

    by Anonymous Coward on Tuesday January 21, 2003 @02:57PM (#5128378)
    You write to a state office, get a completely clueless reply and now ask for a legislative solution? You're quite the optimist, aren't you?
  • by Steve B ( 42864 ) on Tuesday January 21, 2003 @03:01PM (#5128394)
    Because the last thing we need in this country is the government telling us how and when we can send email or make a phone call.

    In certain ways, the government does and should do precisely that. If I repeatedly call you at 4 AM to ask if your refrigerator is running or deliberately send you virus-laden e-mail, then you have every right to call upon the long arm of the law to slap down the harassment.

    Spamming, being a violation of the recipient's property rights, falls into that category.

  • by ottffssent ( 18387 ) on Tuesday January 21, 2003 @03:07PM (#5128439)
    Spam filters seek to classify emails as spam/nonspam based on differences in the emails. The spammers however have absolute control over the content of their emails, so such methods are doomed to a life of one-step-ahead. There is one characteristic of spam which can never be changed by the spammer: spam is computer-generated and mass-mailed. Legit emails are not.

    My idea is this: The system maintains an initially empty whitelist. When mail is received from a sender not on the whitelist, autoreply with a message explaining the situation and requesting an email back whose first line or subject contains a random word or phrase from the dictionary. Human beings will grumble, respond, and get added to the whitelist. Spammers won't give your email the personal attention it needs to get past, so you remain blissfully unaware of it.
  • by Anonvmous Coward ( 589068 ) on Tuesday January 21, 2003 @03:14PM (#5128482)
    "Does anyone think AOL or Hotmail could start using such a system as the one outlined in the article?"

    No. My problem's with the senders, not the messages. What Hotmail should do is send back an email saying "Your message has been rejected because you have not been authorized by this user. If you'd like to request authorization, click here and follow the instructions."

    When they properly fill out the form, you get a message saying "so'n'so wants to send you a message. Interested?" and you can say yes/no. If you say yes, they get added to your address book and they can email you until you remove them from it.

    With this approach, it requires a valid return address before the message can possibly get to you. That means you're able to tell the person to remove you, unlike today's 'send anything to anybody' system.

    If Hotmail did that, I'd actually consider paying for their service.
  • by Violet Null ( 452694 ) on Tuesday January 21, 2003 @03:17PM (#5128492)
    spam is computer-generated and mass-mailed. Legit emails are not.

    Some legit email is definitely computer generated. I sign up for /., it sends me an email with my password. /. will not care about an autoreply, so I would never get that email.

    If you standardize an autoreply, so that websites could parse and return it, then so could the spammers, easily enough.

    Finally, you'd be doubling the amount of bandwidth spent on email, as each spam would now have a corresponding auto reply.
  • by rgmoore ( 133276 ) <glandauer@charter.net> on Tuesday January 21, 2003 @03:20PM (#5128519) Homepage
    This is a wonderful tool that is being developed. However, I don't think any one tool will succeed in eliminating spam. From a spammer's point of view, if my income depends on messages making it through filters, by damn I will bypass those filters by whatever means I can. These assholes send penis enlargement advertisements to my mother -- If her gender doesn't stop them, neither will an email filter.

    I hear this argument and variations on it from time to time, but the more I consider it the more flawed it looks to me. There are really two kinds of filters to consider:

    1. ISP-level filters applied at a network level by a third party.
    2. Personal filters applied at an individual level by the target of the spam.

    These two things are not at all equivalent to the spammer because of the psychology of spam. Fundamentally, email readers are likely to fall into two fairly tight categories: suckers who will listen to spam and non-suckers who won't. Anyone who applies his own personal email filter is likely to fall into the non-sucker category, so there's little point in designing a message specifically to bypass those personal filters. The target won't buy your product even if you do get it past his filter. That's not the case with ISP level filters, though, which protect suckers and non-suckers alike. Those are worth bypassing because they're stopping some email that would get to the suckers who would buy your product.

    Now it may be the case that the same techniques that are useful for avoiding ISP-level filters will also help get mail past personal filters. That even seems likely, given that many people use ISP-type filters for their personal mail because the ISPs don't do it for them. But it seems to me that there's little percentage in specifically trying to avoid personal level filters that work on a different system from the ISP-level filters because the simple fact that somebody is bothering to use the filter implies that he won't buy from the spammer anyway.

  • by knobmaker ( 523595 ) on Tuesday January 21, 2003 @03:48PM (#5128773) Homepage Journal

    Because it would have no effect on spam. Making spam illegal in the United States would simply move the points of origin offshore.

    Why would that be an improvement? No dedicated spammer would hesitate for more than a nanosecond before getting an account at a host in Panama, or wherever they would be safe from local prosecution.

    The only workable solution to spam is to learn to use local filtering. Someday, I hope, we're going to learn that passing laws against stuff that annoys us only leads to unpleasant unforeseen consequences.

  • by waveclaw ( 43274 ) on Tuesday January 21, 2003 @03:53PM (#5128812) Homepage Journal
    Conventional wisdom seems to say that we can't outlaw spam. I don't understand why this is.

    Traditionally and in general, anything on the 'net that can be achived through both technical means and legal recourse is almost always implemented via the technical route.


    The reasons for this are many; the major reasons are simple. While most people on the 'net have not been lawyers, most of the first people - esp. USENET users - were engineers and scientists. Such people develop a distain for legal recourse after spending (wasting) so much of their time in political and legal battles in the *real world* justifying and defending their work and themselves. Just ask any graduate student standing in line at his college Bursar's office how he feels about contracts and (non-technical) paperwork.

    Furethermore, by avoiding the often easy to circumvent and hard to quantify political avenue, the solutions are usually more effecitve in both the short an long term. Many solutions, such as the Baysian SPAM filtering disscussed here, also give these technical people a chance to prove their worth or gain some small measure of fame by association with a good solution.

    Remember: Conventional Wisdom is an oxymoron. There are always reasons for something, even if theose reasons are nothing but hubris and desire. It is up to you to accept or change them.

  • Re:Stop spam? (Score:5, Insightful)

    by rograndom ( 112079 ) on Tuesday January 21, 2003 @04:10PM (#5128937) Homepage
    Filtering is nice, I've been using SpamAssassin with reasonable results for the last few months. It has nearly no false positives but has recently been missing more. Perhaps I should update.

    Actually spamassassin has a nice built-in reporting tool
    spamassassin -r < *mailmessage*
    And if you setup it up to work with with Vipul's Razor [sourceforge.net] for it's all automagically updated.
  • Re:Stop spam? (Score:5, Insightful)

    by Deltan ( 217782 ) on Tuesday January 21, 2003 @04:21PM (#5129020)
    Correction.. spam will never stop... ever.

    You say that it will stop if it's fully against the law and people bring legal action to stop it.

    Last time I checked, murder was illegal, punishable by death in many states, yet it still occurs.

  • by bnenning ( 58349 ) on Tuesday January 21, 2003 @04:50PM (#5129289)
    What if you are a person who deals with financial data over e-mail? What if you routinely help people with their web pages? What if you send long blocks of code?


    Then the filter will adapt to the types of legitimate messages you receive, that's the entire point.

  • Re:Stop spam? (Score:4, Insightful)

    by CoughDropAddict ( 40792 ) on Tuesday January 21, 2003 @06:41PM (#5130348) Homepage
    Last time I checked, murder was illegal, punishable by death in many states, yet it still occurs.

    People spam because it is rational to do so (or at least spammers make them think so). Very low costs, the possibility of a good return, and nothing to lose since there are virtually no spam laws.

    A better comparison than murder is the practice of child labor. While it was legal it was a rational practice to engage in, because the return was high and the risk was low -- if a kid gets eaten by a machine you just find another kid. Now that is illegal the practice is almost completely extinct because it is no longer rational -- the police would come knocking at the door, which impedes the goal of running a profitable business.

"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry

Working...