Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Spam

The Next Step in Fighting Spam: Greylisting 481

Evan Harris writes "I've just published a paper on a new and unique spam blocking method called "Greylisting". The best thing about it other than achieving better than 97% effectiveness in blocking spam, is that it practically eliminates the main problem of other solutions: the false-positive. There's even source code for an example implementation written as a perl filter for sendmail, along with instructions for installing, so you can get up and running quickly."
This discussion has been archived. No new comments can be posted.

The Next Step in Fighting Spam: Greylisting

Comments Filter:
  • by sqrt529 ( 145430 ) * on Friday June 20, 2003 @02:39PM (#6256194) Homepage
    most spam today is sent through open relays. Those relays will simply retry the delivery no matter which software the spammer uses, so the method won't work.
  • In case of /.'ing (Score:4, Informative)

    by Anonymous Coward on Friday June 20, 2003 @02:39PM (#6256196)
    The Next Step in the Spam Control War: Greylisting
    By Evan Harris
    Copyright 2003, all rights reserved.

    Introduction
    This paper proposes a new and currently very effective method of enhancing the abilities of mail systems to limit the amount of spam that they recieve and deliver to their users. For the purposes of this paper, we will call this new method "Greylisting". The reason for choosing this name should become obvious as we progress.

    Greylisting has been designed from the start to satisfy certain criteria:

    1. Have minimal impact on users
    2. Limit spammers ability to circumvent the blocking
    3. Require minimal maintenance at both the user and administrator level

    User-level spam blocking, while somewhat effective has a few key drawbacks that make its use in the continuing spam war undesirable. A few of these are:

    1. It provides no notice to the senders of legitimate email that is falsely identified as spam.
    2. It places most of the costs of processing the spam on the receivers side rather than the spammers side.
    3. It provides no real disincentive to spammers to stop wasting our time and resources.

    As a result, Greylisting is designed to be implemented at the MTA level, where we can cause the spammers the most amount of grief.

    For the purposes of evaluating and testing Greylisting, an example implementation has been written of a filter that runs at the MTA (Message Transfer Agent) level. The source for this example implementation is available as a link below, and as other implementations or additional utility code become available, they will also be linked.

    Greylisting has been tested on a few small scale mail hosts (less than 100 users, though with a fairly diverse set of senders from all over the world, and volumes over 10,000 email attempts a day), however it is designed to be scalable, as well as low impact to both administrators and users, and should be acceptable for use on a wide range of systems, including those of very large scale. Of course, performance issues are very dependent on implementation details.

    The Greylisting method proposed in this paper is a complimentary method to other existing and yet-to-be-designed spam control systems, and is not intended as a replacement for those other methods. In fact, it is expected that spammers will eventually try to minimise the effectiveness of this method of blocking, and Greylisting is designed to limit options available to the spammer when attempting to do so.

    The great thing about Greylisting is that the only methods of circumventing it will only make other spam control techniques just that much more effective (primarily DNS and other methods of blacklisting based on IP address) even after this adaptation by the spammers has occurred.

    The Greylisting Method
    High Level Overview
    Greylisting got it's name because it is kind of a cross between black- and white-listing, with mostly automatic maintenance. A key element of the Greylisting method is this automatic maintenance.

    The Greylisting method is very simple. It only looks at three pieces of information (which we will refer to as a "triplet" from now on) about any particular mail delivery attempt:

    1. The IP address of the host attempting the delivery
    2. The envelope sender address
    3. The envelope recipient address

    From this, we now have a unique triplet for identifying a mail "relationship". With this data, we simply follow a basic rule, which is:

    If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure.

    Since SMTP is considered an unreliable transport, the possibility of temporary failures is built into the core spec (see RFC 821). As such, any well behaved message transfer agent (MTA) should attempt retries if given an appropriate temporary failure code for a delivery attempt (see below for discussion of issues concerning non-conforming MTA's)
  • by HiKarma ( 531392 ) * on Friday June 20, 2003 @02:39PM (#6256198)
    This idea isn't so new or unique. It's been discussed a fair bit on the ASRG [ietf.org] mailing list under the name "tempfailing".

    First I heard of it was from Landon Noll and Mel Pleasant. It is noted in brief as one of the techniques in this plan to end spam [templetons.com] (though their plan, which did include the triplets, is not laid out in full there.)

    It is a worthwhile technique for a little while, and if spammers were rational, would be worthwhile for some time to come. But spammers are not rational, and already this technique is not as useful as would be hoped.

    Do a Google Search for Tempfailing [google.com] especially in ASRG to see statistics etc.

  • by Anonymous Coward on Friday June 20, 2003 @02:41PM (#6256216)
    Just encode your e-mail address on web pages & don't sign up to any dubious mailing lists.

    I haven't received 1 single spam in recent months from doing this!
  • by McDutchie ( 151611 ) on Friday June 20, 2003 @02:49PM (#6256312) Homepage
    Eh, open relays are soooo 20th century. :) Actually most open relays today are either blocked or closed, and newly installed MTAs are secure against third-party relaying by default, so this spam method is dying out [it-analysis.com]. Most spam today is sent either directly to the receiving MTA, through open proxies, or through formmail.pl and similar exploits.
  • Re:Time critical (Score:5, Informative)

    by eGabriel ( 5707 ) on Friday June 20, 2003 @02:50PM (#6256319)
    This isn't true, actually. Once one mail gets through, the system lets in subsequent mails from that sender. So there is only the initial delay, after that CEO Joe can use his email as a fat instant messenger per usual.
  • by seangw ( 454819 ) * <seangw@sean[ ]com ['gw.' in gap]> on Friday June 20, 2003 @02:52PM (#6256341) Homepage
    It isn't always possible to never publish your email address.

    You can, however, establish classes of emails. Most people don't like this however, because you have to check multiple accounts, and it really doesn't stop the spam.

    In order to sign up to certain services / sites you need to provide a valid email address.

    While that email address can be a secondary email, if anything important is going to come in on the email (such as domain information via network solutions) you will still want to use your real email address.

    It's a very difficult issue.
  • Re:Questions (Score:2, Informative)

    by sharlskdy ( 460886 ) <scottman@ t e l u s.net> on Friday June 20, 2003 @02:55PM (#6256366) Homepage
    Retry is configurable, and it depends on the MTA. Qmail has a default retry of 400 seconds (6 minutes, 40 seconds).

    Much of my e-mail comes through within seconds - I'm not sure I want that delayed too much. Although, this delay is on the first matching triplet.

    Server disk space requirements for major providers would climb considerably, I would expect. Legitimate mass-mail programs, and mailing list services would have a problem, tho.

    The algorithm takes advantage of the lazyness of spammers, which is not a bad idea.
  • by tomstdenis ( 446163 ) <tomstdenis@gma[ ]com ['il.' in gap]> on Friday June 20, 2003 @02:56PM (#6256382) Homepage
    You're missing a big part of it though. If you have to try say 3 times to send a message [over a 5 day period or so] you're ability to mass send 100million emails is really squashed.

    Legitimate people first time sending won't really mind the few day wait and most MTAs will try for upto a month.

    Tom
  • Published a paper? (Score:4, Informative)

    by Call Me Black Cloud ( 616282 ) on Friday June 20, 2003 @02:58PM (#6256400)
    Where? To me, publishing a paper means your writing appeared in some peer-reviewed journal (where the "peers" are acknowledged as domain experts). What you did was put up a web page. With a donation link at the bottom.

    For others looking for a solution, try POPFile [sourceforge.net]. Open source, cross platform, gives me 96% accuracy.

    One more thing: "practically eliminates" is not the same as "eliminates".
  • by TheCarp ( 96830 ) * <sjc@NospAM.carpanet.net> on Friday June 20, 2003 @02:59PM (#6256411) Homepage
    not at all

    Read the paper. Spammers would figure it out eventually. What it buys is what they have to do to get around it.

    It means they have to do retrys...that means spam runs take longer, especially since they have to run...then wait for a locally defined timeout, and run all those addresses again

    AND they have to do it from the same IP.

    This raises their bandwidth profile. It wastes their time... all in all... it raises their cost of doing buisness and cuts into their profit margins.

    It means they will have to upgrade their tools again. It means they get headaches. And of course, the next step is to impliment spam traps that watch activity and see that a spammer is spamming, and promotes them to a blacklist before they can even retry. (oh gee 1000 new greylist triplets from 1 IP in under 5 mins? Set the timeouts for that IP to 12 hours)

    -Steve
  • by Anonymous Coward on Friday June 20, 2003 @03:00PM (#6256431)
    Been using the Bayesian filering in ASSP
    http://sourceforge.net/projects/assp/
    With a week of "training" it I now have most excellent results.

    As of Fri Jun 20 13:56:18 2003 the mail logfile shows:
    4402 messages, 2850 were spam (64.7%) in 24 days
    for 183.4 messages per day or 118.8 spams per day
    431 additions to / verifications of the whitelist (18.0 per day)
    2541 were judged spam by the bayesian filter (89.2% of spam)
    279 were to spam addresses (9.8% of spam)
    30 were rejected for executable attachments (1% of spam)
    were sent from local clients (0.0% of nonspam)
    483 were from whitelisted addresses (31.1% of nonspam)
    1069 were ok after a bayesian check (68.9% of nonspam)
  • by abe_is_fun ( 320753 ) on Friday June 20, 2003 @03:04PM (#6256482) Journal
    Spammers are notoriously resiliant. Within a few days/weeks/nanoseconds the spammers would realize they need to retry after a delay, and they would stop with the fire-forget mentality.

    I wish your plan would work but I just don't think it will.

    Plus the spammers can get their viagra at wholesale cost!
  • According to this article [theregister.co.uk] (June 12), open relays at least in the corporate environment are becoming hard to find, requiring spammers to find new ways. In 1997, 91% of mail servers tested were open; as of a year ago only 1%. ISP and home machines apparently weren't tested.

    This doesn't really say what's actually being used by spammers, but it's a sign of improvement. At the least, it narrows the pool of available relays. Continuing progress will increase the spam pressure on those remaining, which will in turn make it more likely that they'll be fixed.

    The article also doesn't say what spammers might use as an alternative. From something else I read recently (don't recall where), mail viruses that take over users' machines are rapidly becoming the tool of choice. There are a lot more of them than mail servers, so it makes sense for the spammers. It does put them in a more dangerous position WRT the law. IMHO (IANAL), using a virus to exploit someone's machine for profit is almost certainly illegal under existing law.
  • Re:Questions (Score:2, Informative)

    by eh1001 ( 683252 ) on Friday June 20, 2003 @03:15PM (#6256613)

    The initial time delay depends on the configuration. The default is one hour.

    For better numbers, try it yourself, and report back. That's the best way to validate it.

    As spammers, adapt, the system can be adapted as needed, but for right now, it makes spammers stay at an IP for some measurable amount of time. That time gives other methods of blacklisting and spam blocking time to work.

    Note that every email will NOT have to be sent multiple times. Only those emails that aren't part of an established "relationship" will.
  • by YE ( 23647 ) on Friday June 20, 2003 @03:24PM (#6256710)
    I get 98-98.5% accuracy with POPfile [sourceforge.net]. I get about 200 mails a day, of which around 30% spam. I get about 1 false negative a day, and maybe 2 or 3 false positives a month. It's a personal solution and as such is much more attractive to me than something server-based which has to be installed by a [typically VERY uncooperative] BOFH.

    I use it experimentally for general mail classification (business/personal/a variety of mailing lists etc., all in all 7 buckets) on my home machine, and it works fine in these conditions too, although the accuracy is a bit lower (around 95%).
  • anyone@domain (Score:3, Informative)

    by autopr0n ( 534291 ) on Friday June 20, 2003 @03:25PM (#6256718) Homepage Journal
    What I do is set my MTA (well, actualy someone I'm using someone else's mail server) to forward all the mail sent to my domain to my main account. That way, whenever I sign up for anything or give away any email address I use a unique address.

    Oddly enough, I've been totaly promiscuous with these throw-away email addresses, but I've only got one SPAM from a company I actualy bought stuff for. So far no one has sold any of the address, and all the websites I've posted too either arn't scanned or are protecting the address well.

    OTOH, my 'main' address are spammed constantly. rrr.
  • by steveit_is ( 650459 ) on Friday June 20, 2003 @03:28PM (#6256765) Homepage
    You know... All of these fancy 'spam-fighting' methods are just a waste of time, when all you really need is a properly configured smtp server and some good free realtime blackhole lists. After making simple changes, I went from receiving 5-6 spams a day down to 1 in a year and a half. With no complaints of 'false-positives'. There have been instances where the senders mail server was misconfigured, but after contacting them and explaining the situation they were invariably helpful. All I did was make sure that only mail sent from fqdn to valid local accounts, and such were allowed there is an ok tutorial on basic psotfix configurattion herehere.... [metaconsultancy.com] and a great one Here. [securityfocus.com]
  • missed the point (Score:4, Informative)

    by eLoco ( 459203 ) on Friday June 20, 2003 @03:33PM (#6256812)

    I've seen some comments that say (paraphrasing) "For real SPAM filtering use <POPFile|Spamassassin|...>", but these missed the point (or perhaps didn't read the paper?). This method is a "first-pass" filter, getting rid of e-mails for which no redelivery attempt will be made. The second-pass filter should still be implemented for everything that gets through the first pass. From the paper:

    "The Greylisting method proposed in this paper is a complimentary method to other existing and yet-to-be-designed spam control systems, and is not intended as a replacement for those other methods. In fact, it is expected that spammers will eventually try to minimise the effectiveness of this method of blocking, and Greylisting is designed to limit options available to the spammer when attempting to do so."

  • by Torp ( 199297 ) on Friday June 20, 2003 @03:34PM (#6256833)
    What he wants to do, instead of rejecting the mail immediately, is reject the mails on the greylist after holding the connection for, say, 10 minutes. That will help deter spamming software, since it will slow down the rate at which mail goes out.
  • Re:Questions (Score:4, Informative)

    by JaredOfEuropa ( 526365 ) on Friday June 20, 2003 @03:57PM (#6257068) Journal
    It's not the sender's mail client that connects to the server that runs the greylist system, it's the sender's SMTP server as provided by their company or ISP. Its IP address will not change regardless of the sender's connection or dynamic IP.
  • by SillySlashdotName ( 466702 ) on Friday June 20, 2003 @04:50PM (#6257571)
    I agree that one of us doesn't get it. :)

    I agree that there is no "magical waiting period or re-try time period". However, by forcing the spammer to re-run through their spam list, their life has been made a little harder, they have been forced to be a little more visible, we have pushed them to use more resources (hopefully hitting them in the wallet), and we have forced them to do something that, BY ITSELF, can be used as a spam indicator. As I mentioned in another post, I rarely get duplicate emails from people - so getting duplicates within 4 hours - as spammers try to get past the greylist - would be a (one) possible signature for spam.

    Spammers are generally (or so I understand) using a 'fire-and-forget' method of spam sending, which is why/how they can send millions of emails a day. Responding to the greylist method takes that away from them or forces them to double their resource usage, their bandwidth, their exposure on the Internet. Resources are not free, bandwidth is not free and most spammers are exposure adverse.

    Either they work a way around the problem - the only way I currently see is to behave more like a legitimate emailer which reduces the number of addresses they can reach in a time period and so, for the same response rate, reduces their income - or they don't bother and the greylist reduces network traffic by refusing the email BEFORE IT IS EVEN SENT.

    I agree the greylist is not a cure - but I never said it was, and it seems to me to be a win-win situation to use it.

    Until there is a fundamental change in the protocols I see this, if adopted widely enough, as a viable way to reduce bandwidth usage and spam. I don't see a change in the fundamental protocols happening quickly (if at all), I do see the greylist here today.
  • by hoggoth ( 414195 ) on Friday June 20, 2003 @04:59PM (#6257648) Journal
    I really like the idea, and I think it will work wonders (if you are willing to accept your nearly-instant email now having approx. 1 hour delay).

    However, here is how the spammers will adapt:
    MailBlast 2.0 will send each mail TWICE. The first time to get the triplet on the greylist, the second time to actually send it. Or a little more sophisticated, it will run in one-hour blocks, and at the end of the hour, re-run the previous hours emails. No queue or other real-SMTP server functionality necessary.

  • by pclminion ( 145572 ) on Friday June 20, 2003 @05:43PM (#6258002)
    I found a copy of the final draft online: Learning Spam: Simple techniques for freely-available software. [pdx.edu] The paper covers several machine learning techniques. The particular one I'm talking about here is the information-theoretic clustering and neural network approach.
  • by mabu ( 178417 ) on Friday June 20, 2003 @07:06PM (#6258537)
    Aside from the obvious of getting the authorities to crack down on the existing illegal activities (relay hijacking, violation of TOS of ISPs, header forging, etc.) which is the only true solution, I think there are much better approaches than this "greylisting" method.

    The problem with the greylist method is it still slows down mail service, and potentially more than the relay blacklist features. The objective here is that end-user/networks should not be penalized in the fight against spam. We already waste too many resources, and according to my latest mail server stats, more than 65% of our inbound mail is UCE. I'm fed up with more than half my e-mail bandwidth being crap my users didn't request so more resource allocation on a local level in the fight against spam is counterproductive!

    Here's a very clever, much more practical method I cound recently.

    A company is Canada has set up what it calls SORBS [sorbs.net]: Spam and Open Relay Blocking System.

    What's different from their blacklist is that they maintain "honeypots" strategically located around the Internet. These are servers they specifically set up as inbound mail relays, but never for legitimate purposes. If the servers get [select] mail activity, it's assumed to not be legitimate and it flags the source as a potential spammer... it makes a lot of sense. You create a domain name, but don't promote it in any legitimate manner, and/or you seed spam lists with these e-mail addresses and then let the spammers send to your key systems around the internet and *bam*, they're identified in real time, and then added to a blacklist.

    I really like this idea. Like any other system, it has the potential for abuse but the beauty is the identity of the honeypot systems is kept secret, so it's very difficult for anyone other than spammers to exploit the network.
  • by Fulcrum of Evil ( 560260 ) on Friday June 20, 2003 @07:33PM (#6258737)

    Well, the fact is if for every one of these "smart" (yeah, right) spammers who has the help of a network consultant that will work around greylisting there are 5 dumbasses who don't

    This does fuck all when your one spamking is responsible for 80% of the SPAM (by volume.

  • by Max Threshold ( 540114 ) on Friday June 20, 2003 @08:35PM (#6259080)
    Open-source crypto works because the secret isn't the algorithm, it's the keys. In this case, the secret is the algorithm. The entire scheme can be circumvented by someone who knows how it works.

No man is an island if he's on at least one mailing list.

Working...