Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Spam

The Next Step In Spam Filtering 349

simeonbeta2 writes "Paul Graham (of "A Plan for Spam" fame) has a couple of new articles up. The first one details the success of Bayesian spam filters despite various circumvention techniques by spammers. While the success of Bayesian spam filtering is encouraging, it certainly hasn't seemed to stem the flow of spam in the last year or so. His second article, however, suggests finally taking the anti-spam battle to the spammers! Paul proposes that spam filtering packages automatically spider links contained in probable spam. Not only will this increase the accuracy of filters (by running the retrieved content through the spam filter as well) but this would effectively be a massive distributed DOS attack on spammers. This isn't a new idea nor is it without its problems but I think it's definitely an idea whose time has come."
This discussion has been archived. No new comments can be posted.

The Next Step In Spam Filtering

Comments Filter:
  • by inertia187 ( 156602 ) * on Thursday October 09, 2003 @05:09PM (#7176328) Homepage Journal
    We've seen first hand how the early Bayesian filters were circumvented. Remember the images instead of text, then the HTML Entities (like A instead of the letter 'A')? The second and third generations of the Bayesian filters had to account for them. I can just see how a DoS filter would be circumvented early: redirects and browser scripts.

    If a filter spiders a spam, all the spammer needs to do is use a redirect or, for smart filters, a small page with javascript that the browser would understand, but would confuse the filter. So yes, the DoS would work at first, but the spammers would realize what was going on and adapt.

    I'm sure meta refresh tags would work in the beginning, but it's simple enough to get a filter to look for those. Eventually, a good filter will have to mimic what the browser does very closely. Maybe it'd be better to actually use a browser that the user can't see.
    • It's possible to include, say, the Mozilla javascript engine in one of these spam filters, which would let it deal with funky javascript. BFilter, for one, uses this approach to deal with ad banners that are inserted in the page by javascript. The redirects can be dealt with; I'm sure there's some standard code for dealing with them that would be easy to use.

      Really, you cn take quite a bit of browser code out of the browser and use it in a filter.

    • Eventually, a good filter will have to mimic what the browser does very closely. Maybe it'd be better to actually use a browser that the user can't see.

      Or set up a filter, and just stop accepting HTML mail altogether. Life is so much better when all of your incoming email is plain text. Most legitimite incoming mail is sent as multipart, so mail from your friends still gets through, even when they use mail clients that want to send out formatted mail.

      The spammers sometimes send multipart messages with
    • Maybe it'd be better to actually use a browser that the user can't see.

      When I write applications like this, I actually use the Microsoft Internet Explorer WebControl... it's free, open, and exactly mimics what IE does. Programmatically it's clunky, but bottom line is the spam wouldn't work in IE if it won't work in the WebControl.

      Then again... don't I remember that Microsoft turned off javascript in Outlook and Outlook Express because of all the potential problems?

      Maybe it wouldn't be so hard to mimic
    • I set up a whitelist after getting hundreds of spam per day and trying every filter and this and that.

      It was just ridiculous.

      the filter points people to my captcha, which is here [intercosmos.net] and they have to type in "I am not a spammer" and then the letters in the graphic.

      The amazing part is, I have actually had spammers complete this process (by hand obviously) trying to get their email to me..

      Anyway, the system I use is opensourced here [intercosmos.net] if anyone wants to set one up.
      • Most spam has forged headers, so you're probably sending out challenges to random people. Getting such random challenges is incredibly annoying, it basically doubles the volume of MY spam for YOUR benefit. I've played with the idea of answering all such challenges for spam mails, but I decided it would be too much work. I'm glad to hear that others are doing it, though.
      • Section 508 (Score:2, Interesting)

        by yerricde ( 125198 )

        the filter points people to my captcha, which is here and they have to type in "I am not a spammer" and then the letters in the graphic.

        The problem with your approach and with any approach that uses a CAPTCHA is that it provides no way for a visually impaired human being to first-contact you. If you use a CAPTCHA, you can't do business with the U.S. government [section508.gov].

    • The spammers are *already* one step ahead. How are we going to DDoS an operation getting free bandwidth from 400,000 compromized machies [slashdot.org] as open proxies?
  • Grr Spam. (Score:3, Insightful)

    by Muerto ( 656791 ) <{david} {at} {vitanza.net}> on Thursday October 09, 2003 @05:09PM (#7176339)
    I think we're on the right track with fining people large amounts of money for being associated with the spam. If you not only go after the people who send the spam, but the people whose products are being advertised, then I think we'll get some results.
    • Re:Grr Spam. (Score:3, Insightful)

      by NineNine ( 235196 )
      If you not only go after the people who send the spam, but the people whose products are being advertised, then I think we'll get some results

      Um no. There are plenty of companies that have affiliate programs with thousands of members. There's no way to keep track of how each of your members are advertising. The results you'll get will be putting lots of innocent companies out of business.
      • Affiliate programs are deliberately designed to shield the parent company from complaints about spam or deceptive advertising. I guarantee you, 99% of those LOSE WEIGHT spams are selling Berrytrim or Herbalife, but you won't see those names until you respond and show some interest because the parent company doesn't want complaints about spam or other shady advertising coming back to them.
      • There are plenty of companies that have affiliate programs with thousands of members. There's no way to keep track of how each of your members are advertising.

        After one or two companies get nailed with horrendous fines, you can bet your ass that the rest will adopt policies specifically prohibiting their members from spamming. Besides, laws enabling that sort of punishment will get plenty of publicity before they actually take effect, giving legitimate operations time to clean house.

        The only addre

        • About the only downside is that some unscrupulous merchants might try to joe-job competitors, but that sort of thing can be handled fairly.

          How?
      • Then a system like this will quickly cause companies to setup new systems requiring their "affiliates" to not use spam. You say:
        There's no way to keep track of how each of your members are advertising.
        Then perhaps you shouldn't be doing business with them. They could be making claims in their advertising that could get you in trouble.
        • You can require all you want, but the whole point in an affiliate program is that your members are advertising for you much more than you ever could. Because I'm sure some of Ebay's millions of affiliates aer spamming, should EBay be shut down (personally, I fucking hate Ebay)??
      • The results you'll get will be putting lots of innocent companies out of business.

        That's fine with me. It will motivate other innocent companies to make sure they're not associating with spammers. I'm ready to see a few "innocent" companies taken down.

      • Re:Grr Spam. (Score:5, Insightful)

        by Mannerism ( 188292 ) <keith-slashdotNO@SPAMspotsoftware.com> on Thursday October 09, 2003 @06:07PM (#7176879)
        Um no. There are plenty of companies that have affiliate programs with thousands of members. There's no way to keep track of how each of your members are advertising. The results you'll get will be putting lots of innocent companies out of business.

        I think I speak for millions when I say, "too fucking bad."

        Seriously, to suggest that these companies are "innocent" is ridiculous. They're downright complicit.
  • Congratulations, Slashdot editors, this is a dupe.

    And I'm a subscriber.

    And I emailed you before it was posted saying it was a dupe of this story: http://slashdot.org/article.pl?sid=03/08/10/161920 6&mode=thread&tid=111&tid=126. Anybody there?

    John.
  • Silly (Score:3, Insightful)

    by ^ ( 104273 ) on Thursday October 09, 2003 @05:11PM (#7176356)
    Then all I need to do to launch a DoS attack is send a piece of spam?
    • You mean that I could send out a spoofed mass mailing "for" my competition and link their website and viola piss off their customer base and take down their website all in one fell swoop. Brillant! Where do I sign up.

  • Repeat from August (Score:2, Informative)

    by merger ( 235225 )
    Feel free to read the comments from when this article was posted to slashdot [slashdot.org] in August.
  • Could be evil. (Score:5, Insightful)

    by grub ( 11606 ) <slashdot@grub.net> on Thursday October 09, 2003 @05:12PM (#7176369) Homepage Journal

    Imagine a Joe-Job where an EvilDoer wants to knock someone else offline and sends out bogus spam with the victim's website.. Think before you jump.
    • by Atario ( 673917 )
      From the article:
      This could be used to DoS innocent victims.

      That's the point of the blacklist. A site doesn't get pounded simply by being mentioned in a spam. It has to be mentioned in a spam and be on the blacklist.
      • But then you're treading closer towards deliberately causing a DDoS attack. Granted, I don't agree, but I think a case can be made that, even if you just click a link once, if you _intend_ to cause problems by clicking a link they send, it's kind of sketchy.

        I'm not trying to illustrate that it's clear-cut DDoS. My point is just that you're getting into a rather gray area of the law, and you have to wonder where to draw the line.
        • Law? What law?

          They sent you a link; obviously they want you to click it!

          They sent a link to a million people; what could make them happier than if each and every one of those million people clicked the link -- over and over and over and over and over and over and over?
  • by Sheetrock ( 152993 ) on Thursday October 09, 2003 @05:13PM (#7176388) Homepage Journal
    Spam alone chews up more than enough bandwidth.

    Having every recipient spider the links in the spam they get will not only make spamming inefficient, but web browsing as well. Enough with anti-spam cures that are worse than the disease -- the last almost killed SomethingAwful, and this might knock off the rest of the websites.

    • Plus if you get false positives it might take out an innocent site.
      • Exactly - I think somebody would end up be held liable for possible damages. It sounds like a good idea at first, but one false positive could do some serious damage and get people in lots of trouble.
      • by tessaiga ( 697968 ) on Thursday October 09, 2003 @05:34PM (#7176614)
        Exactly. Whoever was responsible for writing such anti-spam software would be the first person to get hit with a massive lawsuit the first time some spammer found a way to "aim" this sort of scheme at an innocent bystander. If that bystander happens to be a big company with deep pockets, the programmer could be looking at some serious pain. Knowing that such a risk exists, it would be interesting to see if anyone would still be willing to develop such software.

        The article tries to combat false positives with blacklists. A couple of problems with this come to mind right away. The first is that centrally-maintained blacklists are easy to take offline via DDOS, as we've already seen [slashdot.org] with sites like SPEWS. The second, and IMHO more serious, problem is that this would give the blacklist maintainers huge power over the rest of the internet -- if you ever got on their bad side, or if they were just plain inefficient/not conscientious about accidentally listing innocent bystanders, your site could potentially be shut down until they felt like taking you off the blacklist, just by some spammer spoofing you. Given the poor history of responsiveness that many blacklist maintainers have shown historically, I don't think giving them more power is the answer. Bad enough not being able to send people email if you accidentally get blacklisted -- imagine not being able to get net access at all.

    • This is addressed in the article:

      This is a bad idea because it just uses up more bandwidth.

      That's like arguing that we shouldn't have police, because in addition to all the losses caused by crime, we have people taken away from productive work to chase criminals. If FFBs make working unsubscribe links universal, the result is net less use of bandwidth.

      I'm not proposing that FFBs should be used by people on dialup lines, just by users who have bandwidth to spare-- people at universities and corporations

      • The author, in my opinion, does not fully appreciate the ramifications of his scheme.

        If it works as advertised and causes spammers to capitulate by putting working unsubscribes in, then he is correct: the bandwidth price paid up front would be worth the savings down the road.

        But one has to consider the possibility (and, I argue, probability) that this cunning plan will not convince spammers to honor the desires of the, um, spammees. Looking at the uproar over the federal Do Not Call list by legitimate

    • Not that I'm advocating it, but if you're worried about bandwidth, we could always adopt the teergrube [iks-jena.de] tactic. You don't actually download much of anything, you just open up TCP connections and keep them alive until their servers run out of process space.
  • What about... (Score:5, Insightful)

    by Misch ( 158807 ) on Thursday October 09, 2003 @05:15PM (#7176405) Homepage
    What about the case where the spammer puts a uniquely identifier into the URL. Sure, he may not get a sale from the clickthrough, but he gets verification that your e-mail address is good.

    Then, you get more spam.
    • Yes but if the spam filter is catching it and spidering the link, then the spam filter will also catch the extra spam you get.
    • From the article:
      Wouldn't retrieving web beacons show your address was live?

      Yes, so that might bring more spam. But it would also make web beacons stop working as an index of open rates. And you'd be clicking on unsubscribe links as well, which FFBs would make more popular.
    • Re:What about... (Score:3, Insightful)

      by mengel ( 13619 )
      Acutally, no. If the spam filter is in front of the valid-recipient check on your email system, then all the spam message attempts yeild web-hits, meaning they get "verification" of lots of invalid email addresses. Soon the belief that a web hit from an email address makes it more valuable goes the way of the dodo bird...
  • I have enough bandwith wasted by spam now, without spidering anything.
  • Submit the url's to /.

    My current trick is subscribing the spammers to spam lists, if I get a valid address. Lost 2 addresses on a client's domain this month to spam. (one being our generic "contact us" address).
  • by t0qer ( 230538 ) on Thursday October 09, 2003 @05:15PM (#7176409) Homepage Journal
    Are these subject lines anti Bayesian filters? Just curious cause they've been getting weird lately..

    Xanax_-_No_Prescription_Needed_-_neonatal
    Kuasx ep Pharmaceuticals including Valiumm, prozac, aAmbientforth mw
    Enter to win free cigarettes pedant
    Fight Aging and Skin Cancer Xpxtdp
    Bigger Penis is Better betsy

    I'm just curious why my spam lately seems to just have weird random junk in the subject line, I actually find it sort of amusing because some of the randomness reminds me of turetL}...yndrome.
  • Now I really like spam assassin but it is a damn bitch to get up and running. I have been getting a increasing amount of spam that passes through it because the message is short and only contains web links. It is time to take spam assissin to the next level and have it spider the links running it through something like dans guardian to further qualify the message as spam. That and adding a routine that checks the domain record age.
  • I am currently working on a add-in for postfix. And here is the README which offers an explantion.

    WHY:
    There are several ways to stop the spammers.

    1. Outside influence i.e. legally control it. The current US admin has suggested this, controlling users on the net, but it will simply move the spammers to other countries. To make matters worse, it is easy to see that this has no chance of working. This approach has its roots in other ideas.
    2. Try to determine spam at either the server or client level. I
  • How long before some nut on slashdot goes and kills or seriously injures a spammer?

    Although, just driving by a spammers house and posting pictures and the address does some good.

    But how far before it gets as crazy as the anti-abortion people who started logging the license plates of people who work at abortion clinics. That, combined with the shooting/killing of doctors, really cut down on doctors who perform abortions.

    The animal rights people have started logging plates of lab employees at the UC Davis
  • We need to restrain spammers more effectively. Here is a possible cure [redcoat.net]. It may even work on Darl.
  • From this page [paulgraham.com]:

    Why have email as part of the system? Why not just have a blacklist of spam sites and encourage people to beat on them?

    Several people have written suggesting a "DDoS@Home" project of this type. (Two correspondents who shall remain nameless simultaneously invented this catchy name.) But I think mail should remain in the system for two reasons: (a) it tells you which sites to pound, and when, and (b) if you included it as part of a filter, you could get more users.

    On the other hand, i
  • by image ( 13487 ) on Thursday October 09, 2003 @05:27PM (#7176546) Homepage
    Malicious virus and trojan authors spend a lot of time and energy writing code that can infect host machines across the internet and wait for incoming instructions to launch a DDOS attack against a target.

    And there is actually a proposal for people to voluntarily install this on their machines? And the trigger is simply an email?

    Sick of yahoo.com today? Take them down -- just spam the net with junk mail that points their site. Have a vendetta against a guy that hosts his own email over a DSL line? No problem -- you won't even need to spam that many people before their auto-crawling DDOS boxes take his server down.

    Yikes.
    • This effect could be amplified even further by a Melissa-like worm, if anyone remembers that. With so many lusers running M$ e-mail products, it wouldn't be too hard to find a way of making them e-mail out the spam themselves, so that if they don't FFB the spam, then they replicate it! Imagine. By sending one message, someone could take down all of Yahoo!, M$, Google, and a slew of small ISPs that would buckle under the outgoing traffic...
  • If you only follow the link programatically once, and everyone else did as well, you allow the malicious to perform a DDoS an innocent server. It is unlikely that the blacklist could be maintained properly.

    Once you follow the link more than once, and programatically, you are treading into the aea of DDoS. It could be that the authorities will come looking for you!

    But the real key is that spammers are using distributed hosting techniques to host there web sites through unprotected windows machines with

  • It's just a game of one-up, and as long as we continue to use SMTP, the spammers will always have the upper hand. New authentication and verification methods need to not only be developed, but supported by the big ISPs.
  • I know this is a bit basic, but it seems to work fine for my personal accounts.

    I simply filter ever email address not in my manually added address book to a spam folder. Every person I email has an entry in my address book (automatically added).

    Once in a great while, I'll go into my spam folder and check for mail that might have been filter by mistake and add any email addresses to my address book from those emails.

    It is pretty difficult for a spammer to defeat this. You would have to customize spams for
  • I think the only option to fighting spam is changing email as a whole as specified at www.spamnazi.org [spamnazi.org].
  • I am currently running POPFile 0.19.1 [sourceforge.net] and it's classifying my two main e-mail accounts (approx 200 e-mails per day, 17.89% spam) at 98.92% accuracy. I'm pretty happy with that...
  • - Look sir, the amount of traffic that our e-mail ad campaign from that guy Henry McSpammy is generating!

    - We'll, that's good, I guess we'd better give him the new hardware and T3 connection he wanted then, we may have even more traffic. Keep up the good monitoring work!

  • Yes indeed, ladies and gents - i am going to share with you all, free of charge, the ultimate spam filtering method, guaranteed to catch 100% of all incoming spam mail.

    All you have to do is redirect all incoming e-mail to the trashcan.

    (standart disclaimer: this system will have about 10% false positives for most users)
  • Spammers' links generally contain lots of advertising. If our spam filters now automatically visit all webpages pointed to in spam mails, couldn't that in itself become a source of revenue? Just spam with as many URLs loaded up with as many pay-per-impression ads you can think of ... would this really help?
    • No, I can't see that helping -- because it would lower the response rate of the ads, thereby earning less per clickthrough. And the clickthrough rates are already terribly low.

      Plus, if this does become widespread, the solution's simple: just have the automatic visiter be a little smarter about whom it downloads stuff from. You should only download stuff from blacklisted sites.

      (I'm not sure I like this plan anyhow, since I think it's ridiculously aggressive; but the foundation should work very well.)

      -Bill
  • What about phrases like "by clicking on this link you agree to let us call your house" kind of things (where the link containers a token for identification purposes). Having a filter auto-follow links could be really dangerous then.

    The interesting thing is how the courts would end up viewing auto-clicks vs manual clicks. I'd bet that if a user set up a filter then it would be effectively view as the user doing the clicking...

  • If the spam filter spidered links, wouldn't that cause the hit count of the target page to go up? If so, what's to stop Spammy McSpammer from using his incredible hit count to convince people to buy ads on the site? We don't want to make these bastards more money.

    Or, it could very well be that I'm misunderstanding the whole thing...

    -troy
  • Much of the spam these days is being sent by trojans running on unsuspecting computers, and many of the web sites pointed to in spam are on systems whose owners have no idea their machines are being abused.

    A better idea would be to work on speeding up the response time for mechanisms used to shut down spam, such as Spamcop and Vipul's Razor. The general idea is that we should automate and accelerate the chain of events starting with spam detection (manually or by spam filters,) followed by reporting of s

  • This plan would have the effect of turning the email system into a DDoS amplifier. A simple email sent through some SMTP server somewhere saying "Enlarge your penis! http://12.34.56.78:1234" to multiple recipients would greatly increase an attacker's effective DoS bandwidth.
  • I have a free Yahoo mail account and the false positives are non-existent. Well, as soon as I put one of my relatives guilty of forwarding inspirational messages to me onto a white list, there weren't any. A couple of false negative slip through, but those are few and far between. I'm pretty diligent about clicking on the "inform Yahoo this is spam" link.
  • filter all you want, but the spam won't go away. you can make laws against spam, but then many (most?) of the companies are out of the country. but almost ALL of them use credit cards, or maybe paypal to perform their transactions.

    so how about if we make a law that fines credit card companies if they do business with a known spammer (a business who has been reported by many and verified to be spammers)? perhaps the spammers will start accepting check or cash, but i think their returns would drop so substan
  • if someone wanted to ddos a site, those types of filters would make it a wonderful tool.

    i.e. I wanted to ddos some competing website for something and so blasted out billions and billions (think Sagan) of emails and used peoples paranoia to my evil benefit.

    No, I don't think that will work, the human portion he mentions would require someone always sitting around waiting which by then the spammer could be almost finished with his spam run..

    I don't think so, it could be turned around so fast..
  • There's some lively discussion on this topic here [slashdot.org].
  • thanks to the unwanted mail i get everyday i now have a penis thats longer than i am tall! it used to be so small i could fit my replica of J Lo's ring around it, but not anymore! now even printing a full size picture of it is easy thanks to the great deals i got on printer toner! well im off to my free las vegas vacation that i got just for punching the monkey!

    (disclaimer: i am NOT the man from nantucket)
  • Comment removed based on user account deletion
  • but this would effectively be a massive distributed DOS attack on spammers.

    versus

    In other words, you could host your Viagra-peddling site with a company that has a stringent no-spam policy, but a DNS lookup will point to a home user's compromised machine.

    Attacking a spammer's resources only increases the spammer's impetus to steal resources. The further you push them underground, the harder they are to uproot when you get a real tool.
  • I think most of us agree that spam is really an 'arms race' -- it's all about us building better spam traps faster than spammers can build better spam-senders that defeat our spam traps.

    This idea is akin to introducing nukes to the arms race. Short term, it might give us an advantage over spam. But in the end, the Internet's worse off -- mail servers will be using significantly more bandwidth for no particular reason.

    We ought to look at it as an arms race, and consider the 'good of the Internet' -- not ju
  • White lists. On the net every encounter (email/im) is a potentially hostile encounter. I was using Bluebottle.com [bluebottle.com] (R.I.P.) for a few months (6) and it was the bomb.

    I added whoever I wanted to my list or they authenticated themselves. At least if a spam did get through (not in my experience) it would have to have a valid return address and thats a step in the right direction.
  • Graham suggests automatically retrieving the contents of any url contained in suspected spam messages, and analyzing the contents for further spammy content.

    That asks for trouble: a lot of the URL's have unique identifiers, like http://spammersite.com/idiot?moron=asdjicn98niucd n 23d where the identifier is linked to your email address on the spam server. Retrieving the url is then like clicking a remove link: it confirms to the spammer that your address is live, so he works harder to get through your fi

  • Can I bring my six-foot steel prybar? Does he have a plan for preventing me from being convicted of murder?
  • One of the nice thing about the Net is that we don't have a single authority that polices us.

    Policing on our own is thus necessary. Done right, it can even be a boon.

    However, any failure to be extremely fair and as gentle as possible will add credence to those who would call for a single authority.

    I'd rather have spam than the FBI, or Regional Bureau of Concern, in my affairs.

    Fight the spammers, but don't go overboard. Mistaking innocents for spammers would be overboard.

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...