Forgot your password?
typodupeerror
Spam IT

Spam Trap Claims 10x-100x Accuracy Gain 419

Posted by kdawson
from the see-it-when-i-believe-it dept.
SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.
This discussion has been archived. No new comments can be posted.

Spam Trap Claims 10x-100x Accuracy Gain

Comments Filter:
  • Ummmm.... (Score:3, Insightful)

    by rustalot42684 (1055008) <fake AT account DOT com> on Monday December 03, 2007 @11:36PM (#21567711)
    I read part of TFA, and it seems to be saying that you can id spam mails because they are being sent to a person who gets lots of spam. But that still doesn't take into account the fact that that person also receives legit mail, AND the fact that what is spam to one person isn't spam to another.

    Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.
    • Re: (Score:3, Insightful)

      ... AND the fact that what is spam to one person isn't spam to another...

      That's not true though. Spam is defined as bulk, unsolicited e-mail. Even if some retard actually likes to read their spam e-mails and buy things they advertise, that doesn't change the fact that the message was sent in bulk (to many other people as well), and that it was unsolicited by at least the vast, overwhelming majority of them.
    • Re: (Score:3, Informative)

      by ceoyoyo (59147)
      Not quite (the AC who replied and got modded up is also incorrect).

      They're using LOTS of accounts to grade e-mail. It doesn't work at all unless you're an ISP with lots of different accounts to monitor. The idea is that if a bunch of people get the same e-mail (already a good indicator of it's spaminess), if people who get lots of spam are more likely to have received it than people who don't get much spam at all, the message is more likely spam.
      • Re:Ummmm.... (Score:5, Insightful)

        by Mundocani (99058) on Tuesday December 04, 2007 @02:04AM (#21568703)
        The main problem I can see is that even if this system works it is easily circumvented. The big assumption is that you can identify the recipients of a particular message, but spammers can easily ensure that information isn't easily obtained.

        First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).

        Then they avoid batching recipients based on their domain so he SMTP server can't tell who else is receiving the message.

        The only way to derive the recipients now is to compare all messages against all others in order
        to match them up. So they hash every message and combine those with identical hashes.

        But putting a little unique text in each message during transmission foils that.

        Spammers: 1 New weapons: 0
        • Re:Ummmm.... (Score:4, Interesting)

          by doom (14564) <doom@kzsu.stanford.edu> on Tuesday December 04, 2007 @03:16AM (#21569057) Homepage Journal

          First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).

          How exactly is a message supposed to get somewhere if it doesn't have the recipient info? I think you're confusing what you see in your mail box to what the mail servers see.

          In any case, as is typical the news article doesn't really provide enough information to determine how the system actually works. It does sound like it's working on the premise that since spam is done in "bulk", if you see lots of identical messages going through a server you can assume that that's spam. The obvious problem would be that spammers can include randomly generated content.

          But that problem is so obvious, it seems likely to me that I don't understand the system they have in mind.

  • by damn_registrars (1103043) <damn.registrars@gmail.com> on Monday December 03, 2007 @11:41PM (#21567753) Homepage Journal
    At least once a week there seems to be another flashy technique to filter or block spam. Great.

    Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.

    If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.

    Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
    • by Anonymous Coward
      1) Issue a Fatwah that spam is an insult to Islam.
      2) Behead those who insult Islam!
      3) No more spam. Allah Akbar
    • by ender- (42944) <doubletwist@@@fearthepenguin...net> on Monday December 03, 2007 @11:51PM (#21567827) Homepage Journal

      If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.

      Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
      But why would the anti-spam software companies want that? If they succeed in actually eliminating spam, they'd also go out of business. It may be profitable for the spammers, but I suspect it's even more profitable for the anti-spam companies.

      • But why would the anti-spam software companies want that? If they succeed in actually eliminating spam, they'd also go out of business.

        you assume that all anti-spam filters are proprietary, open source filters exist and can be modified to your desire- that in its self should force anti-spam companies to adapt otherwise they got replaced by free as in gnu software. it is in their best interests to at least attempt to beat FLOSS and FLOSS has a lot going for it- if someone finds a better way to code for the

      • [tinfoil hat] Could that be where spam profits actually come from, then? Not from the sale of the advertised products, but from selling anti-spam proprietary software that's specifically programmed to ID your spams (through e.g. a checksum)?

        Because if the revenue model involves getting people to buy stuff in spam links, you would *think* the credit card companies would find the spammers within about a day or so...

        Someone replied and mentioned the free spam filters, suggesting that the "spam and sell spam f
    • Re: (Score:3, Insightful)

      by ucblockhead (63650)
      Yes, and once we can stop drugs from being profitable, we will see them go away too.

      Oh, and prostitution, too. And identity theft. And insurance fraud. Yup, it's simple to fix. Just make it unprofitable! Simplicity itself!
      • by MightyYar (622222) on Tuesday December 04, 2007 @12:14AM (#21567985)
        As much as I'd like to forget it, I think your post made me realize that some spam is actually filling a market need. Ugh. Yay, capitalism!
    • by OzRoy (602691)
      And how exactly do you propose we do this?

      Everyone knows what spam is, but it's economical because there are idiots out there who ignore the warnings and buy the crap anyway. So it seems that the only ways to make spam uneconomical is to either remove idiots from the Internet (Internet Utopia here we come!), or stop the spam from getting to them.
      • And how exactly do you propose we do this?

        Everyone knows what spam is, but it's economical because there are idiots out there who ignore the warnings and buy the crap anyway. So it seems that the only ways to make spam uneconomical is to either remove idiots from the Internet (Internet Utopia here we come!), or stop the spam from getting to them.
        Make it illegal and fine the people who profit from it.
        • Re: (Score:2, Funny)

          by wvmarle (1070040)

          Make it illegal and fine the people who profit from it.

          Easier said than done. First start with a legal definition of spam e-mail, that does not cover things like mailing lists. Personally I am sending out many mass mailings, on an opt-out basis (I harvest interesting mail addresses myself) - and get very few opt-outs and many reactions. I specifically send mails to people that may be interested in buying my goods. This should definitely be legal, it's a great marketing tool and helps my business very well.

          What should be illegal (and I suspect is already) are

          • by choongiri (840652) on Tuesday December 04, 2007 @12:38AM (#21568135) Homepage Journal

            No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:

            You are a spammer.

            • Re: (Score:3, Funny)

              by wvmarle (1070040)

              No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:

              You are a spammer.

              Most e-mail addresses I get are from business cards and from websites where people post their e-mail with the specific purpose to get offers of the product that I have. Some I get from other sources, but again this is from sources where the e-mail addresses are posted with the specific intent of receiving these offers.

              So it is not as black-and-white as most people here try to put it. I have a mailing list containing maybe 500 addresses or so, and get on average 10-20 reactions on the offers sent, and 50-

          • by lgw (121541)
            It's quite simple: opt-out mailing list = spam. You = spammer. World = better off without you.
            • I took the liberty of converting your post into an ascii semantic web.

              You --- does ---> spam <--- is-a --- opt-out mailing list
              ^
              |
              |
              would-be-better-off-without
              |
              |
              World
          • by penix1 (722987) on Tuesday December 04, 2007 @01:01AM (#21568291) Homepage

            ...and get very few opt-outs and many reactions.


            I can imagine the reactions you get...

            There are two reasons for this. First, nobody is receiving your emails because you are blocked nine ways to hell in their spam filters. Second, because most spam (yours included) use the opt-out crap for email verification of their lists. They know they have a live one so most sane people ignore opt-out links in email since they are dangerous.

            what needs to be changed *IS* the opt-out crap. It needs to be confirmed-opt-in plain and simple. While they are at it, I wouldn't say no to outlawing email harvesting either. Throw in a $10,000.00 fine for each violation of either provision and call it pretty. Make half the fine go to the organization that hunts down violators and we got a sound business solution.
          • Re: (Score:3, Funny)

            Where do you live?
          • Re: (Score:3, Insightful)

            by Kadin2048 (468275) *

            get very few opt-outs

            Might this be because nobody with two neurons to rub together actually uses an opt-out link? (After all, if you're scummy enough to send me unsolicited email, you're probably scummy enough to use that "opt out" as a test to determine whether my address is real, and thus to be sold to other scum for more profit.)

            You may be a nice person and run a respectable enterprise in all other respects, but if you're sending out unsolicited emails on anything more than an individual basis, you're a spammer.

            Furthermore,

            • Re: (Score:3, Funny)

              by wvmarle (1070040)

              get very few opt-outs

              Might this be because nobody with two neurons to rub together actually uses an opt-out link?

              No, I ask them specifically to reply. Or call me - telephone number is in the mails that I send. As is my real, verifiable company name.

              You may be a nice person and run a respectable enterprise in all other respects, but if you're sending out unsolicited emails on anything more than an individual basis, you're a spammer.

              Which, like most people here also don't get because they can not READ and are completely pre-determined that any commercial mail == spam, is the case. E-mails are not sent out randomly, but only to addresses where there is a reasonable and real chance they are in the same business.

              • by Kadin2048 (468275) * <slashdot...kadin@@@xoxy...net> on Tuesday December 04, 2007 @02:22AM (#21568799) Homepage Journal
                There's all sorts of commercial mail that's not spam. If I order something from you, and you send a reply back confirming my order, that's both commercial and definitely not spam. As is any other reply to an inquiry.

                Where it crosses the line and becomes spam is when it's unsolicited. That's the key. Unsolicited commercial email is the very definition of spam, and no amount of hand-waving about opt-outs or the selectivity of the lists is going to change that.

                Businesses that have relied on cold-calling via any medium to drum up sales have always been sleazy in my book, but when you do it via email, you're pushing the cost out onto the recipient and onto uninvolved third parties. That's at best unethical, and at worst flat-out theft.
          • Could someone explain why this guy hasn't been modded down yet? And why his anus lacks splinters that were rubbed off during his forced sodomization with a wooden rod?
      • by QuantumG (50515)
        Fund a government agency to fight spam by tracking down the people sending it (note: I said people, not computers) and fine them. You don't have to fine them much.. just a little more than they earn sending the spam, multiplied by your ability to find the spammers. The profit is now gone.

        Don't care enough about spam to pay a tax to fund a government agency to make spam history? Then stop complaining about it like its the end of the freakin' world.

        • by OzRoy (602691)
          Of course, a Government agency to fight this stuff. Because agencies/businesses that devote their entire business model to prevent other illegal activities like online piracy has worked real well. After all it has caused the Pirate Bay has disappear and go out of business. /sarcasm
    • by pclminion (145572) on Monday December 03, 2007 @11:58PM (#21567875)

      At least once a week there seems to be another flashy technique to filter or block spam. Great.

      It's not "flashy." It's called information theory and statistics. It is an extremely powerful concept that has far more important potential uses than simply filtering spam email. Every new advancement in automated classification and knowledge extraction is VITALLY IMPORTANT to our ability to cope in a world which has suddenly been flooding with SO MUCH information. This power tool is being applied to what some might see as a "silly" problem, but the fact remains that spam is a powerful motivation to researchers to push further limits in the fields of pattern recognition, information and natural language processing.

      If you're against the advancement of information processing techniques, then... uh, okay, I guess. If you can't see beyond spam, you are terribly short sighted.

    • by wizardforce (1005805) on Tuesday December 04, 2007 @12:04AM (#21567907) Journal
      how do you propose we remove the economic incentive for spam? ok let's see how this has been attempted or hypothesized in the past: charge a fee per email rather than a blanket fee from the ISP for access. ok but most of the real spam that is being sent is done through compromised PCs so attacking the problem by charging a fee per email is useless because the people in control of this spam-net are not the ones paying for bandwidth/email fees. ok then pass laws against it. that doesn't work either, the remaining spam-nets will still work because it can not be enforced in the host country let alone all those who are not subject to the law. ok then build better spam traps. tried that, it isn't doing so well- spam is still getting through in large numbers. educate people? that will certainly make things better in a lot of ways but there will still be that twat that actually wants to get spam... have ISPs cut off high bandwidth connections from those suspected of spamming? can anyone say privacy nightmare? as much as I hate spam I hate the idea of ISPs snooping through your email no matter what their reasons are. now what?
    • Charge money to send emails. That idea has been discussed before, I know, but there is a twist to make it work - make it so that the recipient is the one who gets paid. After all, it's their time the spammers are wasting so they should be fairly compensated. This would cause serious problems for people who run listservs, so this would have to be combined with user customizable white-lists. In the ideal case, each recipient can even name their own price, have a white list, and retroactively forgive debt. For
    • by Spazmania (174582)
      Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem.

      That's all well and good, but wake me up when you have a viable economic solution based on the premise that spam is an economic problem. And by viable I mean doesn't have a massive downside like e-stamps, trampling on the first amendment, or elevating jail times for spammers beyond those for violent crimes.

      In the mean time, you'll have to pardon me if I don't throw
    • Re: (Score:3, Interesting)

      by 7Prime (871679)
      How about charging the sender $0.01 for every email that's never opened. That way, spammers risk a HUGE number of people catching the trap and not opening their email. It wouldn't be worth it to advertise in that fashion, because you lose more than you make (spam requires 10s of thousands of emails to be effective, if 90% of those are unopened, than you risk losing over a hundred dollars on a scheme that might make you $50 on a good day)
    • by Jimmy_B (129296) <slashdot@@@jimrandomh...org> on Tuesday December 04, 2007 @12:44AM (#21568189) Homepage

      Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
      Not exactly. It's making money for the spammer, but it probably isn't making money for the person who hired him. You see, even if no one ever bought anything advertised in spam, it would still be sent. The problem is multilevel marketing [wikipedia.org], which creates a lot of people desperate to sell unsellable inventory, some of whom pay spammers to advertise it for them. A perceived economic incentive is enough, even if there isn't a real one.
    • by gad_zuki! (70830)
      >We've already seen how many spam filtering / blocking programs produced in the past 5 years?

      Lots. Even in my most anceint hotmail account I see almost no spam. The filters are working and the spam cat and mouse game has reached a point where the sophistication of spam detection is outpacing the spammers. There comes a point where their resources cannot keep up. We've reached that point I think. I dont expect spam to ever leave but now its a controlled problem. In the future we might even start seeing
    • by MaceyHW (832021)
      Don't worry, the technology he developed is much better than the technology described in the article.

      "We were sitting around thinking of ways to obfuscate the description about how our system worked so the spammers would be misdirected," he said. "So I came up with receiver reputation as something that might sound plausible. Then as I thought about it more and more, the more sense it made to me."

      Whatever the brilliant technology he came up with is, this is just the obfuscating, fake description of it.

    • Re: (Score:3, Insightful)

      by mcrbids (148650)
      Once we can stop spam from being profitable, we will finally see it go away. But no sooner.

      Way to go, Captain Obvious!

      This goes down in history with other sayings of similar caliber, such as

      1) "Once we can stop scams from being profitable, we will finally see them go away. But no sooner."

      2) "Once we can stop prostitution from being profitable, we will finally see it go away. But no sooner."

      3) "Once we can stop theft from being profitable, we will finally see it go away. But no sooner."

      Somehow, despite havin
  • Makes sense (Score:5, Informative)

    by Dan East (318230) on Monday December 03, 2007 @11:45PM (#21567785) Homepage Journal
    I own a number of domains, and receive all email to each domain in a catch-all account. I receive a great deal of emails to totally fictitious email accounts at my domains. Those recipients receive 0% legitimate emails, so anything sending to those accounts is 100% certainly a spammer. Basically what Abaca is doing is working with all the shades of gray in between. Also, this is a system that can only be employed at the server level. It's not like you could add this technology to your stand alone email client.

    Dan East
  • So the way I read this is that it works like a reverse karma system. It doesn't really make much sense though. Remember the old adage about lies and statistics. Without seeing there analysis who knows what they twistsing. I would very much like to see actual data about this system. The idea that a person's amount of spam would fit any sort of predictable distribution seems like a bit of a stretch to me. If anyone with actual numbers could come forth, I think we would all appreciate it. Even if there was a r
  • Is it a joke? (Score:3, Insightful)

    by jmv (93421) on Monday December 03, 2007 @11:51PM (#21567829) Homepage
    Seriously, I don't see how anything working remotely as described can work. First, it guarantees that any OSS mailing list will be flagged as spam because we our emails tend to be on the web and we all receive lots of spam. Then how the hell is someone going to know what percentage of spam I receive (or do they expect everyone to give them access to their inbox?)? Even if that were to work, all the spammers would have to do is let the zombies send one email at a time, at which point either they block all my email or they leave it all through. Dumb idea or dumb reporting?
  • by sonikbeach (939185) on Monday December 03, 2007 @11:52PM (#21567835) Homepage
    How does one initialize this system? Spam is determined by user reputation, yet user reputation is determined by quantity of spam received. Am I missing something? The logic seems circular.
    • Re: (Score:3, Insightful)

      Exactly! The system lacks a way of defining what exactly it's blocking. How does one determine that one say receives 25% spam? Does Abaca do the analysis or are you just supposed to guess? While the equation obviously works on paper, when implementation comes it is clearly missing a major element, ie a definition of spam.
    • by wvmarle (1070040)

      This chicken and egg problem is not that hard to overcome.

      Start off with "traditional" filtering techniques, they are quite accurate and I suspect give a good enough sample size to get you started.

      A second option may be to ask users to mark their spam manually for a day or so. That should also be manageable.

      Lastly when there is one group up and running, as I understand it new users can be added without any problems. Just keep them out of the statistical pool (only check their incoming mails on spaminess

    • by The Raven (30575)
      Google's PageRank is a circular algorithm as well, but that doesn't prevent it from working.

      However, this sounds more like a technique to augment traditional spam detection engines. Take SpamAssassin output as a precondition to classify the users, and then use that classification as an input to the SpamAssassin engine with a high weight. Tadaa! Increased detection accuracy.

      Whether it would actually work or not, I dunno. Seems plausible, but only as a server based approach, such as something to augment Googl
  • Form letter (Score:5, Funny)

    by Anonymous Coward on Monday December 03, 2007 @11:54PM (#21567847)
    My first attempt at doing this, please feel free to ammend/critique:

    Your post advocates a
    (X) technical ( ) legislative ( ) market-based ( ) vigilante

    approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

    ( ) Spammers can easily use it to harvest email addresses
    (X) Mailing lists and other legitimate email uses would be affected
    (X) No one will be able to find the guy or collect the money
    ( ) It is defenseless against brute force attacks
    (X) It will stop spam for two weeks and then we'll be stuck with it
    ( ) Users of email will not put up with it
    ( ) Microsoft will not put up with it
    ( ) The police will not put up with it
    ( ) Requires too much cooperation from spammers
    ( ) Requires immediate total cooperation from everybody at once
    (X) Many email users cannot afford to lose business or alienate potential employers
    ( ) Spammers don't care about invalid addresses in their lists
    ( ) Anyone could anonymously destroy anyone else's career or business

    Specifically, your plan fails to account for

    ( ) Laws expressly prohibiting it
    ( ) Lack of centrally controlling authority for email
    ( ) Open relays in foreign countries
    ( ) Ease of searching tiny alphanumeric address space of all email addresses
    (X) Asshats
    ( ) Jurisdictional problems
    ( ) Unpopularity of weird new taxes
    ( ) Public reluctance to accept weird new forms of money
    ( ) Huge existing software investment in SMTP
    ( ) Susceptibility of protocols other than SMTP to attack
    ( ) Willingness of users to install OS patches received by email
    (X) Armies of worm riddled broadband-connected Windows boxes
    (X) Eternal arms race involved in all filtering approaches
    ( ) Extreme profitability of spam
    ( ) Joe jobs and/or identity theft
    ( ) Technically illiterate politicians
    (X) Extreme stupidity on the part of people who do business with spammers
    ( ) Dishonesty on the part of spammers themselves
    ( ) Bandwidth costs that are unaffected by client filtering
    ( ) Outlook

    and the following philosophical objections may also apply:

    ( ) Ideas similar to yours are easy to come up with, yet none have ever
    been shown practical
    ( ) Any scheme based on opt-out is unacceptable
    ( ) SMTP headers should not be the subject of legislation
    (X) Blacklists suck
    (X) Whitelists suck
    ( ) We should be able to talk about Viagra without being censored
    ( ) Countermeasures should not involve wire fraud or credit card fraud
    ( ) Countermeasures should not involve sabotage of public networks
    (X) Countermeasures must work if phased in gradually
    ( ) Sending email should be free
    (X) Why should we have to trust you and your servers?
    ( ) Incompatiblity with open source or open source licenses
    ( ) Feel-good measures do nothing to solve the problem
    ( ) Temporary/one-time email addresses are cumbersome
    ( ) I don't want the government reading my email
    ( ) Killing them that way is not slow and painful enough

    Furthermore, this is what I think about you:

    (X) Sorry dude, but I don't think it would work.
    ( ) This is a stupid idea, and you're a stupid person for suggesting it.
    ( ) Nice try, assh0le! I'm going to find out where you live and burn your
    house down!
    • Specifically, your plan fails to account for

      (X) Asshats
      (X) Extreme stupidity on the part of people who do business with spammers
      actually this is one of the first methods i've seen that turns asshats and stupid people who do business with spammers into a part of the solution
  • From the web site [abaca.com]:

    Unprecedented accuracy. Over 99 percent spam blocking means fewer than one mistake in every 100 messages processed. That's 10 to 100 times fewer mistakes than any other available systems.

    Uhh. So this system makes 1 mistake in 100, and claims this is 100x fewer than some other system. Apparently this other system they are comparing against gets it wrong every single time. I guess one way to make your products look good is to compare them against the theoretical worst competitor imaginabl

    • Re: (Score:3, Insightful)

      by MightyYar (622222)
      In TFA, the example is:

      "At 99.8 percent you miss two out of 1000," said Mr. Kirsch. "At 95 percent you miss 50 out of 1,000. So other systems give you 25 times as much spam. Who wants that? Nobody we know."
      He then goes on to claim that more users will improve the system to where it is 100x better than 95%, or 99.95% effective.
    • by lakeland (218447)
      Er, no.

      10 to 100 times more accurate than existing systems means that for every 10 to 100 mistakes that existing systems make, this system will make just one.

      For instance if they say current technology is 80% accurate then out of ten thousand emails coming in, 2000 will be incorrectly classified. 100 times more accurate than that means 20 errors, or 99.8% accuracy.

      Now, it happens that TFA is peddling snake oil. The top spam blocking programs make one mistake per ten thousand emails processed or 99.99% ac
      • by Temporal (96070)

        10 to 100 times more accurate than existing systems means that for every 10 to 100 mistakes that existing systems make, this system will make just one.
        Right, and the site claimed it makes 1 mistake per 100, so if it makes 1 mistake for every 100 mistakes that some existing system makes, then that existing system must be making 100 mistakes per 100.

        I think the site just made a mistake in their numbers, but I found it funny.
  • I've never once had a spam message in my Gmail inbox, it all gets caught by their spam filters and ends up in the appropriate folder. There's 150 in the spam folder right now, and they get deleted automatically after 30 days, so I get around 5 a day. That's probably just the ones google thinks are possibly spam, who knows how much they filter out that we never even see. Their filtering tech is pretty close to perfect, but it's always those last few points that are the hardest. So I seriously doubt this as y
  • Because they're going to be needing an OC-256 or the fucking spammers will be able to ddos the servers that compute aggregate scores off the 'Net and break the system.
  • This is clever: filtering spam by exploiting properties of spam pumps in general, vs. straight content analysis. The competition of ever-more-sophisticated content scanning techniques on one side, and spammers' escalating workarounds and huge botnets on the other side, is an arms race that shows no sign of abating.

    Of course, this approach does still depend on something—probably content analysis—to determine which messages are spam and which are not, so that receivers' spam statistics can b

  • While this is a rare case of the algorithm actually being original (as opposed to rehashing an old idea "on the web"), it is yet another software patent. I'll lump it with RSA - the kind of software patent you might actually want to read if all software patents were that original.
  • by fm6 (162816)

    The article wanders off into human-interest territory
    "Wanders?" The human interest part is most of the article! Not everybody thinks that a new spam filter is more interesting than a person's struggle to survive.
  • by CustomDesigned (250089) on Tuesday December 04, 2007 @12:41AM (#21568169) Homepage Journal
    Honeypots have been a published anti-spam technique for a decade. The idea is to publish bogus mailboxes that are not close to any legit mailbox. Any message with a honeypot as any recipient is spam. 100% accurate. (And I blacklist the IP for a week for good measure.) I use a variation, where any message with 3 or more invalid recipients is spam (blacklist IP). That is a little risky since someone may legitimately be trying various mailboxes manually with a telnet session because they forgot the exact name. This technique gives each recipient a score between 0 and 1 that reflects how close to a honeypot that recipient is, with actual honeypots (100% spam) being 1.0.
    • by bcrowell (177657)
      Yeah, reading the article, I had the same impression, that it was just a honeypot. And in fact, I don't even think it's a useful generalization of a honeypot. A honeypot is an address that receives 100% spam. This method is supposed to look at accounts that receive low levels of spam as well, but how is that useful? I receive essentially 0% spam, mainly because I change my address every year. Even if I was willing to share my data with this company, what would it tell them? I get an email from my wife telli
    • Re: (Score:3, Funny)

      by Atario (673917)

      someone may legitimately be trying various mailboxes manually with a telnet session because they forgot the exact name.
      Really? Come on. Really??
  • by syousef (465911) on Tuesday December 04, 2007 @12:53AM (#21568245) Journal
    From TFA with commentary:
    "he has started four companies, all based on his frustrations with existing products or services"

    Unless they're all still in business that's probably 3 failures on record.

    "Along the way he has amassed a personal fortune of about $230 million"

    But he got out before the ship sank and with a bundle of cash too. I wonder what his ex-employees got...

    "This is harder on my wife than it is on me," he said during a recent interview. "I just look at it as a problem. Here's a problem and you have four years to solve it or you don't get to solve any more problems."

    How philosophical...So he's going to cure himself single handedly of a rare disease in 4 years, because medical research is as easy (and cheap) as writing software or tinkering with a home engineering project. I think he's been watching Crusade and sniffing glue.

    "His perspective on his disease is also clear. Fourth on his list is "Why human beings will be extinct in 90 years." He writes, "My incurable blood cancer is minor compared to what is happening with the planet. We have somewhat more than 90 years before humanity is virtually extinct.""

    Don't even know where to start on this one. I can't be bothered reading about his reasoning, but he's not the first person to predict the end of the world just beyond his own lifetime.

    Oh and by the way he has a bridge, I mean some anti-spam software to sell you.

    Gimme a break! Nothing to see here.
  • This approach is quite similar to that taken by the DCC. Quoting from its home page: "The DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001."

    As is often the case, those who are new to the spam problem frequently believe they are inventing something new, when it's most likely t

  • As I understand it, this method looks at a message and analyzes it based on the users to whom it has been sent. What is not clear to me is how the system would cope with individually customized spams.

    Spammers already have systems in place to randomly mutate the spam messages, to defeat systems that block spam based on identity. For example, consider Vipul's Razor, where people cooperate to flag messages as spam. Suppose a spammer sends a message with the subject "Panda Obligate Greenspan" to Joe, and Joe
  • Getting a 99% accuracy is still almost useless. To be useful you need four nines at least.
  • Something I didn't totally see in that is the following scenario.

    I've had an E-Mail address for.. well, we'll just say "forever" that's so old it was used to post on USENET before using a "real" E-Mail address was a problem. Additionally, it's also been used on some domain registrations, and in general seems to wind up on quite a few spam lists.

    Using current filtering, somewhere around 80% of all E-mails this account gets is spam.

    On the other hand, I'm also on a number of popular mailing lists with that E-
  • My first atemptt at donig tihs, plseae feel free to amemnd/ciqirute:

    Yuor psot acvotaeds a
    (X) tehnccial ( ) lavsilegtie ( ) mkreat-based ( ) vgntiiale

    apprcoah to fgthiing spam. Yuor ieda will not wrok. Here is why it won't work. (One or mroe of the flnwoilog may aplpy to your ptaruicalr idea, and it may have otehr flwas wihch used to vray form satte to satte bfoere a bad freeadl law was passed.)

    ( ) Smpreams can eislay use it to hrsevat eiaml aerdessds
    (X) Milnaig ltiss and other laeititgme eaiml uess

  • So he's describing a spam-filtering system which is basically saying "if this bunch of people are all getting this email, it smells like spam".

    While I'll admit I'm ludicrously overgeneralizing his technique, and I have no real knowledge of exactly how Google identifies spam, I'd say his method smells distinctly similar to essentially what Google must be doing (broadly speaking).

    If I were him, I'd be seriously researching how close his work is to The Big G, and make sure there's no conflict/overlap; or he'
  • by propelCEO (661892) on Tuesday December 04, 2007 @04:11AM (#21569301) Homepage
    Thank you for all the comments on the NY Times article.

    It would be difficult for me to answer each and every comment, so I'll try to just hit the high points here.

    It's quite easy to poke fun at an algorithm which is unknown to you as demonstrated by all the comments.

    But what's more relevant is whether really smart people who know the algorithm can find fault with it. There are only two people outside of Abaca who know the algorithm: Stephen Wolfram (author of Mathematica) and University of Waterloo Professor Gordon V. Cormack (a well known figure in the anti-spam community). I picked Wolfram because he's the smartest pure math guy I know. I picked Cormack because I think is one of the smartest and most respected scientists in the spam field. You could contact either of them and ask them what they think of the approach. I can tell you what they'd say if you did that. They'd tell you it is a simple, elegant algorithm that has no obvious (to them) holes. I know that because the reason I disclosed it to them was to see if I overlooked anything. Neither found any holes. That doesn't prove that there aren't holes. All systems have holes. What this does mean is that a couple of pretty respected experts think it appears to be pretty solid logic.

    In fact, Gordon was kind of enough to go even further and gave me permission to use the follow quote: "This is, by far, the most clever technique I'm aware of for spam filtering." Since Gordon is conference chair for a lot of spam conferences, this is a pretty significant endorsement from someone who KNOWS the full algorithm and who knows the spam space better than just about anyone.

    I spent about 4 years studying what others had done in the space. As one commenter pointed out, the recipient reputation system can be thought of as a generalization of the honeypot technique that was first patented by Brightmail.

    That's exactly right. My realization is that every email address has statistical value, not just honeypots. So instead of just "black" feedback, the system incorporates "grey" and "white" feedback; every recipient has an apriori odds associated with receiving mail. For many years, Brightmail was the "defacto" standard for spam filtering. Brightmail is just a special case of the algorithm I invented. So instead of learning from honeypots, we learn from ALL recipients and incorporate that statistical input in a mathematically rigorous way in order compute a statistical likelihood that our prediction was correct. That gives us much more input than a honeypot system: it gives us white, black, and grey values. That is critical to avoiding false positives because good sites (like Yahoo and Hotmail) send email to honeypots all the time. And we incorporate that feedback into a statistical framework that is much more accurate than what Brightmail used.

    Exactly how we incorporate that input into spam scoring has not been publicly disclosed. It is not obvious.

    People who say that this must be snake oil or cannot work ignore the fact that the system has been in use by real customer for more than a year. We have over 100 customers and are just annoucing our existence to the world, so that number should increase quite rapidly now that we are starting to market our product. There are customer testimonials on our website. You can contact them directly to verify that these quotes are legitimate.

    Here are statistics from one of our rating servers. There were 1,380,140 messages since the last counter reset. 96% were rated spam. There were 176 false positives and 66 false negatives reported. I just grabbed those stats from one of our live servers right now as I was composing this message. Sometimes we're better, sometimes we're worse, but those numbers are pretty typical.

    It's not perfect, but I think those are pretty good error rates for where we are now. And the stats always get better as we add more customers since we get more statistical input and this is just a statistical estimation problem. The more data, the more accurate
    • Re: (Score:3, Informative)

      by nagora (177841)
      But what's more relevant is whether really smart people who know the algorithm can find fault with it.

      I have to say that that is the dumbest remark about software design I've ever heard. I've worked with lots of really smart people and I've seen them all miss bugs that were obvious to other people. Wolfram recently missed an error in a proof, for example.

      It's more useful to have a lot of reasonably smart people look at something than have TWO (2) supposedly "really smart" people.

      But, anyway, spam is a so

    • Re: (Score:3, Interesting)

      by flonker (526111)
      The biggest flaw I see with the system is that spammers will try to figure out "good" addresses and send more spam to those particular addresses compared to others. ie. include a web bug in the email, if the email gets through, that address is then mailbombed into oblivion increasing the rating for any of the participants of that mailbombing.

      Also, eventually, the known good address may get so much spam that it becomes a "bad" address, invalidating future good emails.

      Many systems to stop spam work on small

Never trust a computer you can't repair yourself.

Working...