Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Spam

The Next Step in Fighting Spam: Greylisting 481

Evan Harris writes "I've just published a paper on a new and unique spam blocking method called "Greylisting". The best thing about it other than achieving better than 97% effectiveness in blocking spam, is that it practically eliminates the main problem of other solutions: the false-positive. There's even source code for an example implementation written as a perl filter for sendmail, along with instructions for installing, so you can get up and running quickly."
This discussion has been archived. No new comments can be posted.

The Next Step in Fighting Spam: Greylisting

Comments Filter:
  • Bayesian Filtering (Score:3, Interesting)

    by Dr Rick ( 588459 ) * on Friday June 20, 2003 @02:48PM (#6256297)
    I'm finding that use of the Outclass interface to POPfile is surprisingly effective at dealing with my spam problem (and I get a lot of it) - since training POPfile I haven't had a single spam message get into my inbox no false positives. Of course I could just be very, very lucky and with this post the email gods will punish me...

    How does the effectiveness of Greylisting compare with what others are seeing with existing techniques (such as Bayesian filtering)? Is it a false positives problem, such as digests and opt-in mailing lists getting incorrectly tagged as spam?

  • by blakestah ( 91866 ) <blakestah@gmail.com> on Friday June 20, 2003 @02:48PM (#6256300) Homepage
    The thing that is wrong is the SMTP protocol, and most people's conception of a spammer. Once you see a few "confessions of ex-spammers", everything changes.

    There are people out there who pay $10000 in startup costs, and then make $2000/week for spamming. The $10000 gets them software written by knowledgable internet security experts. This software finds any and every way to anonymify the email spam, and finds lists of people to spam.

    As long as knowledgable internet security experts are getting paid good cash to enable spammers, and SMTP doesn't change, spam will only continue to get worse. There needs to be a fundamental change in SMTP protocols. It oughta take the spammers about 2 days to fix their MTA bug to get around greylisting.
  • My spam filtering (Score:1, Interesting)

    by Drummer_Dan ( 648348 ) on Friday June 20, 2003 @02:54PM (#6256355) Journal
    I have one filter that blocks at least 90% of my spam. If a message contains the word "offer" it's toast. Works for me anyways.
  • I think not (Score:5, Interesting)

    by Monoman ( 8745 ) on Friday June 20, 2003 @02:54PM (#6256356) Homepage
    Doels this mean all public crypto algorithims are useless?
  • by grub ( 11606 ) <slashdot@grub.net> on Friday June 20, 2003 @02:54PM (#6256357) Homepage Journal
    Open proxies get most of my rejects, here's a paste from "spamstat" (a quick script I did that cron's me the output once a day). The logs rotated not quite 2 hours ago.
    Open Relay: 1
    Dialup Spam Source: 0
    Confirmed Spam Source: 2
    Smart Host: 0
    Spamware Developer or Spamvertized site: 0
    Unconfirmed Opt-In List Server: 0
    Insecure formmail.cgi: 0
    Open Proxy Server:8
  • by siskbc ( 598067 ) on Friday June 20, 2003 @02:56PM (#6256372) Homepage
    The best idea I've seen in YEARS was to have people start using a specific, original poem as their signatures. Then, the author granted license to anyone who WASN'T sending spam. Therefore, they could sue any spammer for copyright infringement if they used it, and you could train your mail filter to look for the signature. Once spamassassin took it up, it pretty much snowballed. See story here [wired.com]
  • by pclminion ( 145572 ) on Friday June 20, 2003 @03:00PM (#6256426)
    Wrong. 1 false positive can be acceptable, and in fact is probably better than how things are now.

    At USENIX '03 there was a paper presented on artificial intelligence techniques for spam detection. I can't provide a link since only USENIX members can download the paper (at this point, at least). I was a coauthor of that paper.

    One of the things we've discovered in our research is that some classes of filters (most notably, the one I have been developing along with a few other individuals) are actually more effective at correctly classifying email than humans are. That is to say, you can train the learning algorithm on mostly-correctly-classified data, then re-run it over the training data, and almost miraculously, it discovers all kinds of email in the training set that was incorrectly classified.

    I.e., this filter has discovered mail that I myself incorrectly thought was spam. It's scary, because there's a lot of it.

    To assume that a human will always be 100% accurate at classifying their own email isn't just arrogant, it's plain wrong. Newer filters that will be introduced in the near future might possibly be more accurate than you, a frail human, could ever be.

  • by JJAnon ( 180699 ) on Friday June 20, 2003 @03:01PM (#6256437)
    I don't think the mistake has anything to do with it being open source. It could be closed source, and would still fail because the basic premise is so simple - it relies on spammers sending spam to your inbox and not bothering to resend it if an error code is returned. So all a spammer has to do is just resend the message a couple of times to get around the spam 'filter'.
  • Re:Questions (Score:1, Interesting)

    by Anonymous Coward on Friday June 20, 2003 @03:02PM (#6256456)
    From my reading, it sounded like it stored the ip address of the MTA. This would not affect dial up users more than any others.
  • by phr1 ( 211689 ) on Friday June 20, 2003 @03:11PM (#6256549)
    The registrar I use (jumpdomain.com) has a clever hack for despamming WHOIS contact email. Basically they change your published contact address once a week. The published address i automatically generated, looks like gibberish, and forwards to your real address. If someone wants to contact you by looking up your address by WHOIS and writing to you, it works fine. But if they add the address to a mailing list, it stops working in a week. That has eliminated almost all my WHOIS spam. Good scheme.
  • Why spam me? (Score:1, Interesting)

    by Anonymous Coward on Friday June 20, 2003 @03:14PM (#6256595)
    I'm not sure I completely understand spam with how it works today. Why would a spammer want to send mail to someone who is clearly not going to buy from them? I mean, I understand their logic is to send as much spam to as many people as possible, but if we make e-mail addresses that say NOSPAM in them why would they even want to spam that address? Do these guys get together and discuss how many messages in a day they can send as if they're talking about how big their penises are?
  • by lxdbxr ( 655786 ) on Friday June 20, 2003 @03:15PM (#6256612) Homepage
    The summary does not seem completely accurate; since the greylisting MTA sends an SMTP temp failure there should never be any false positives as long as the sending MTA is vaguely RFC-compliant (sadly not true I suspect). Or at least that was my reading of the paper...

    I'm currently using Bogofilter [sourceforge.net] (and looking into CRM114 [sourceforge.net]) and getting better than 99% accuracy (about 1 in 200 false negatives at the moment) and very very few false positives (maybe 2 in 5000 messages).

    Of course these are MUA level filters (and yes, I know, I've already "paid" with bandwidth to download the spam) - however since the proposed "greylister" would have to be installed as the MTA at major ISPs (as the authors note) I'm not convinced that is more likely to get widespread adoption than the various sorts of adaptive client-based filtering now available, particularly as it requires a database to back the method up.

    As far as I am concerned the major factor in a spam filter should be zero false positives - personally I don't mind reviewing one or two spams a week but I get really annoyed if I were to lose a real message (note the two false positives I have sent to date with bogofilter contained forwarded sales pitches along with a message).

  • by Henry Stern ( 30869 ) <henry@stern.ca> on Friday June 20, 2003 @03:29PM (#6256774) Homepage

    It means they have to do retrys...that means spam runs take longer, especially since they have to run...then wait for a locally defined timeout, and run all those addresses again

    AND they have to do it from the same IP.

    Not to mention that if this is used in conjunction with other collaborative tools (i.e. RBL, checksums), by the time that the spamming MTA can return its IP address will have been submitted to MAPS/etc. and the contents of the message will have been submitted to Razor/Pyzor/DCC.

    I think that this greylisting idea will be pretty hard to beat by Joe spammer. Since the game of spam detection is pretty much an arms race, slowing him down will probably be enough to turn the battle in your favour.

  • by robby_slaughter ( 678825 ) <[slashdot] [at] [robbyslaughter.com]> on Friday June 20, 2003 @03:40PM (#6256884)
    Personally, I only accept messages that are 10 MB in size or larger. If you want to email me, please be sure to include a huge block of random text at the end of your email or else I'll never see it.

    I don't get any spam using this approach, because the spammers don't send big messages. And if *everyone* ignored small messages, spammers would have to close up shop because they could not afford to send millions of big messages.

    (This is a joke. But you could do this at the SMTP level, by automatically replying to any sender who is not on your personal whitelist with the response: "Hey you, if you're real, send me back a HUGE reply!" And the SMTP server could cheerfully delete the last 99% of the first-time oversized email you get. I should write my own anti-spam paper and get mad Slashdot cred. Nah, I'm too lazy.)

  • by tomhudson ( 43916 ) <barbara,hudson&barbara-hudson,com> on Friday June 20, 2003 @03:42PM (#6256901) Journal
    Open Relay Database Stats by Country [ordb.org]

    You'll notice that the US is the #1 country Top 3 are:

    1. The United States, with over 80,000 open relays
    2. Korea and Japan pretty much tied at +15,000 each
    3. Japan, at just under 10,000
    That's more than everyone else combined!
  • E-Mail Secrets (Score:2, Interesting)

    by Nakoruru ( 199332 ) on Friday June 20, 2003 @03:44PM (#6256923)
    I have written an essay on ending spam [io.com]. The idea is to associate a second piece of information that goes along with your e-mail address. This 'secret' can be used for anything you want, such as blocking anyone who does not get the secret right.
  • How about DSPAM ? (Score:3, Interesting)

    by MeerCat ( 5914 ) on Friday June 20, 2003 @03:55PM (#6257049) Homepage
    I quite like the idea of this greylisting, but it seems a lot of spam is nowadays being sent as DSPAM (cf DDOS) or Distributed Spam. A spammer infects a load of broadband machines with a simple trojan, and then calls upon a number of the trojans to send an email spam via that machines normal MTA (ie for most windows machines it uploads to the ISPs mail servers).

    I know this is happening as some complete bastard seems to be doing this using my domain as a "From:" address (well, [random-word]@schmerg.com), meaning that I'm been getting about 30 or 40 bounce messages a day for the last 2 or 3 months now. And although the odd sending IP is repeated, mostly they're all from different IP addresses. And of course I'm getting perfectly valid looking bounce messages from perfectly reasonable companies (and only a couple of abusive replies so far).

    Now the problem is that the email is being uploaded to thier (non-open-relay) ISP's mailserver that will retry properly, and anyone else looking at the IP address will see a perfectly reasonable IP (the spammer seems to gave infected a lot of AT+T customers, ComCast customers, etc.). So short of blocking spam on subject, this spam is harder to prevent in the first place.

    I've semi-automated a process to report the infected machines (that provoked a bounce message) to the appropriate ISP, and seem to havign some success in getting the ISPs to contact their customers, but I think this new form of spamming using a distributed attack will be particularly hard to block.

    Anyone with a great idea (or who knows more about this scheme, or the identity of the twat behind it) I'd love to hear from you...

    --
    T
  • by KFury ( 19522 ) * on Friday June 20, 2003 @04:19PM (#6257281) Homepage
    Evidence has shown that email harvesters don't even *try* to parse even the simplest email obfuscation techniques (kevin at fury dot com, or changing the @ to an html entity). [cdt.org]

    These are widespread practices and yet spammers don't care about spending the effort to reach that 5% who so adamantly don't want to be reached.

    Unless greymail is used by more than a quarter of *all* email readers, history has shown that the spammers just won't care.
  • That's a good point (Score:3, Interesting)

    by SuperKendall ( 25149 ) on Friday June 20, 2003 @04:30PM (#6257375)
    Anything that makes spammers do extra work seems to make it more likley that they will have patterns that can be observed and then blocked before mail really goes anywhere...
  • by Lafe ( 595258 ) on Friday June 20, 2003 @04:42PM (#6257493) Homepage
    I once heard a story once that is probably false (you never know), but contains an interesting idea on how to end spam.

    It says that Mississippi tried to outlaw "flag burning", and the law was struck down as unconstitutional. So the Mississippi legislature responded by limiting the maximum penalty for assaulting a person who is engaged in flag burning to a $25 fine.

    This sounds like a fine idea on how to handle the spam problem.

    Create a law that states that the maximum penalty for physically assaulting a spammer is a $100 fine. I know more than a few people who'd be willing to pony up and take a whack at them.

    Though this law would probably be far more satisfying than sensible.
  • by letxa2000 ( 215841 ) on Friday June 20, 2003 @04:48PM (#6257552)
    is reject the mails on the greylist after holding the connection for, say, 10 minutes. That will help deter spamming software,

    I doubt it. I would assume the spam software would have a timeout, and I doubt it's ten minutes. If they want to hit-and-run and aren't even willing to make a second delivery attempt when an error code is returned, I doubt they're going to wait 10 minutes. I'm sure that within 30 seconds or less they'll consider it a dead connection and hang up.

    Problem is, I used to have my sendmail HANG UP in real-time on an incoming connection as soon as it realized a message was spam. I.e., the incoming message was filtered in the DATA phase and if it was spam I hung up immediately. It worked great and it felt good, but there were many spam programs that took the disconnection as some kind of TCP/IP failure and immediatelty tried again. So I had one day where a single message was attempted to be delivered about 30,000 times as the spammer connected, I hung up, spammer software said "Oops, let me try again!" About one delivery attempt every second or so.

    I'd be willing to bet if you put a 10 minute timeout in sendmail you'll see lots of spammer software disconnecting sooner and just trying again. It takes more of their resources, but takes more of yours, too.

  • by SillySlashdotName ( 466702 ) on Friday June 20, 2003 @05:16PM (#6257798)
    Spammers rely on a tiny hit rate to a huge number of emails. If they sent more emails, they would make more money. I assumed (dangerous, I know) that they would therefore be sending the maximum number of emails they could based on their bandwidth limitations as well as their resourse limitations - or as many as they thought they could get away with without being noticed/blacklisted/shutdown. I.e., they are either running flat out, or trying to stay under somebodys' radar.

    If they are sending 250,000,000 emails a day, then that must be all they CAN (or think they can - amounts to the same thing) send or they would be sending more. If they have to send everything twice, then they have just dropped the number of emails to 125,000,000 - and cut their income from their activity in half.

    Another possibility is that they double the size of their email farm and the width of their pipeline - but that also takes $$$, time, and resourses - my computer requires electricity or it refuses to work.

    I am for anything that hits spammers in the pocketbook.
  • by volkerdi ( 9854 ) on Friday June 20, 2003 @05:45PM (#6258021)
    There is no magical waiting period or re-try period that cannot be trivially coded around. And, with good money on the line, will be trivially coded around. You don't get it. Really smart people are getting paid a whole lot of money to make programs to exploit every possible crack in the way we send email.

    Yeah, spammers are so clever. Well, the fact is if for every one of these "smart" (yeah, right) spammers who has the help of a network consultant that will work around greylisting there are 5 dumbasses who don't (and I think I'm being generous there), then if I greylist I'd think over 80% of my spam problem would be eliminated. What's wrong with that? What's to "get"? Looking through headers I see the same bulk mailers used over the years, probably passed around as warez in spammer circles.
  • My Anti-Spam Idea (Score:3, Interesting)

    by _iris ( 92554 ) on Friday June 20, 2003 @06:25PM (#6258310) Homepage
    Here are my latest thoughts [intercarve.net] on winning the spam war.

    I've submitted it to Slashdot. They rejected it. Tell me what you think. I'd like reactive approaches to get discussed a bit more. If you do too, submit this to Slashdot :]
  • by thgreatoz ( 623808 ) on Friday June 20, 2003 @07:01PM (#6258506)
    ...which it couldn't, IMHO, due to the ... tenacity...of the spam community, I still think people would take issues with the proposed "1 hour delay". There are plenty of times when a company will send you an e-mail while you're on the phone with them (for example, RMA requests for damaged equipment), or perhaps you've forgotten the password to a particular news site and need to have it sent to you. Having to wait an hour for something that used to be near instantaneous is a less than ideal solution. My vote's for an overhaul of STMP. We'll be better off in the long run.
  • Pardon me for being clueless, but I don't see in this concept a description of what happens when a posting from a listserv or other e-mail list goes to a new subscriber. Mail bounces back to the listserv, right?

    Well, the first e-mail to a news subscriber is often the e-mail required to confirm subscription. No reply, and the subscriber is plonked.

    Sounds suboptimal to me.

    So, you whitelist the listserv machines... until one of them has to change IP addresses. Whoops! No umpteen bazillions of e-mail messages go no where.

    I'm sure that listserv admin would find the idea suboptimal about this time.

    Nice idea, but No Slack, as far as I can see.

And it should be the law: If you use the word `paradigm' without knowing what the dictionary says it means, you go to jail. No exceptions. -- David Jones

Working...