Spam Trap Claims 10x-100x Accuracy Gain 419
SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.
Ummmm.... (Score:3, Insightful)
Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.
Re: (Score:3, Insightful)
That's not true though. Spam is defined as bulk, unsolicited e-mail. Even if some retard actually likes to read their spam e-mails and buy things they advertise, that doesn't change the fact that the message was sent in bulk (to many other people as well), and that it was unsolicited by at least the vast, overwhelming majority of them.
Re: (Score:3, Informative)
They're using LOTS of accounts to grade e-mail. It doesn't work at all unless you're an ISP with lots of different accounts to monitor. The idea is that if a bunch of people get the same e-mail (already a good indicator of it's spaminess), if people who get lots of spam are more likely to have received it than people who don't get much spam at all, the message is more likely spam.
Re:Ummmm.... (Score:5, Insightful)
First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).
Then they avoid batching recipients based on their domain so he SMTP server can't tell who else is receiving the message.
The only way to derive the recipients now is to compare all messages against all others in order
to match them up. So they hash every message and combine those with identical hashes.
But putting a little unique text in each message during transmission foils that.
Spammers: 1 New weapons: 0
Re:Ummmm.... (Score:4, Interesting)
How exactly is a message supposed to get somewhere if it doesn't have the recipient info? I think you're confusing what you see in your mail box to what the mail servers see.
In any case, as is typical the news article doesn't really provide enough information to determine how the system actually works. It does sound like it's working on the premise that since spam is done in "bulk", if you see lots of identical messages going through a server you can assume that that's spam. The obvious problem would be that spammers can include randomly generated content.
But that problem is so obvious, it seems likely to me that I don't understand the system they have in mind.
Re: (Score:3, Funny)
"Anonymous Coward" --> A Condom Warns You
Re: (Score:2, Insightful)
Re: (Score:3, Insightful)
courseofhumanevents -> "Must Fence A Nervous Ho"
Re: (Score:3, Insightful)
Re:Is linux for homos? (Score:4, Informative)
Not all homosexuals are happy, cheerful people either.
Re: (Score:3, Interesting)
This is a system where they look at the history of who a person has sent e-mail to. If the sender has a short term history of sending e-mail to people who mostly receive spam, the e-mail is considered more likely to be spam. Conversely, if the sender has a short term history of
Re:No (Score:4, Insightful)
You didn't RTFA well enough. That it's about recipients is the selling point.
That's a truth with modifications, though. Look at the quote from the web site I put in my parent post to yours, which clearly shows that it's a block based on who the sender has sent an email to. I'll repeat it, in case you missed it:
"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."
Yes, it's a recipient based system in that it assigns a score to the sender based on what the recipients of the emails are. But the blocking occurs due to the score of the sender, based on previous emails, not on the recipient of the current email.
Just think -- if it was based on blocking based on recipient only, it would either block all or no e-mail to an inbox with a single recipient. It would then only be effective for e-mails with multiple recipients, which doesn't match the claims made.
Again, think, and read the article (and that goes for the moderators too).
Re:You are also totally wrong (Score:4, Insightful)
(Ah, that explains the completely asshat moderation here, then.)
No, I didn't get it backwards -- RTFA. It's called a recipient verification system, but when you look at their own description on how it operates, you'll find that:
- It looks at the recipients of a message, and based on how much spam each of the recipient accounts gets, assigns a score to the sender.
- This score is accumulated over the last 25 emails.
(The reason for this is rather obvious, if you think about it -- if it based its score on just the last e-mail, if you sent an e-mail to someone who receives a lot of spam, it'd be automatically blocked, and that person would not get any e-mail at all.)
Say a sender sends three e-mails, to foo@foo.invalid, bar@bar.invalid, a bunch more people, and finally baz@baz.invalid. If foo@foo.invalid receives 30% spam, and the overall average is 80%, that means that the e-mail is unlikely to be spam. So a score is saved in a table for the sender. Then it goes to bar@bar.invalid, who also has a low 40% spam rate, and another "good" score is saved for sender. When the sender then after a while sends an email to baz@baz.invalid, who has a spam rate of 95%, the fact that he sent an e-mail to foo and bar earlier will increase the likelihood of his email to baz going through.
Conversely, if foo and bar received more spam than average, an e-mail sent to baz would be scored as more likely to be spam, even if baz received a record low 10% spam.
Yes, in a way, it's receiver based, because it builds the score based on the receivers' ratio of spam to valid e-mails. But the score is applied to the sender, and they state this in clear text on the web site itself. You only have to read past the sales pitch and down to the technical details.
Re:You are also totally wrong (Score:5, Funny)
In summary, I have no idea what I'm talking about because I didn't RTFA. That I am aware of this fact makes me superior to the lot of you who are arguing over the inner workings of this week's spam-filter vaportech -- which was probably written up in an incomprehensible and inconsistent manner such that it will go over the heads of foolish investors, and part them from their money.
Re:You are also totally wrong (Score:4, Informative)
But its really designed to be a corporate product. So even if the each spam email contains only one recipient they all go through the corporate email server, allowing it see all the various recipients a given sender is emailing.
And there were even hints that the software stored on your corporate mail server might be sharing some information with a central data store, allowing it to get the score of all the recipients that the sender is sending to on any network that is a customer of this product. (So it doesn't matter so much if your company only has 10 people to average across because it is somehow cross checking against the global dataset which is tens of thousands of recipients.)
Yet another wrong answer... (Score:5, Insightful)
Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.
Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
The solution to spam (Score:3, Funny)
2) Behead those who insult Islam!
3) No more spam. Allah Akbar
Re: (Score:2)
Re:Yet another wrong answer... (Score:5, Insightful)
Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
Re: (Score:2)
you assume that all anti-spam filters are proprietary, open source filters exist and can be modified to your desire- that in its self should force anti-spam companies to adapt otherwise they got replaced by free as in gnu software. it is in their best interests to at least attempt to beat FLOSS and FLOSS has a lot going for it- if someone finds a better way to code for the
Re: (Score:2)
Because if the revenue model involves getting people to buy stuff in spam links, you would *think* the credit card companies would find the spammers within about a day or so...
Someone replied and mentioned the free spam filters, suggesting that the "spam and sell spam f
Re: (Score:3, Insightful)
They're the ones creating the successful antispam systems -- you know, the ones that actually scale up on the gateway. The popular vision of bumbling PHB buffoons everywhere is just another stupid slashdot stereotype, fostered by insecure social retards who have to foist their apparent superiority over everyone by scoffing at everything. Sure, they exist, but lon
Re: (Score:3, Insightful)
Oh, and prostitution, too. And identity theft. And insurance fraud. Yup, it's simple to fix. Just make it unprofitable! Simplicity itself!
Re:Yet another wrong answer... (Score:4, Funny)
Re: (Score:2, Insightful)
Re: (Score:3, Informative)
Do you know how much it costs your ISP to run the mail infrastructure for your legitimate mail?
Triple it. That's the cost of spam.
Re: (Score:2)
I don't think anyone expects prostitution to go away - I think they just hope to make it safer for all involved. Those of us who are realistic know it won't go away.
Re: (Score:3, Interesting)
But legalised free for all of any drug you want is bullshit, imagine a night out when the person beside you can snort some cocain or smoke some ice?
That's the reality right now. Cocaine is only £20 per gram. I can pick up the phone and have it delivered. Like pizza, only it'll be here faster.
oh and the idea that it's ok to legalise something addictive, so you can tax people to fund their treatment is some of the worst logic i could imagine.
You legalise the addictive substance to reduce the price. Even with the tax it doesn't even come close to the previous illegal street price. This then cuts off black market. The dealers lose their revenue source and all the drug price fueled crime goes away. Basically you allow the addicts to fund their own downward spiral. Eventually the problem solves its
Re: (Score:2)
Everyone knows what spam is, but it's economical because there are idiots out there who ignore the warnings and buy the crap anyway. So it seems that the only ways to make spam uneconomical is to either remove idiots from the Internet (Internet Utopia here we come!), or stop the spam from getting to them.
Re: (Score:2)
Everyone knows what spam is, but it's economical because there are idiots out there who ignore the warnings and buy the crap anyway. So it seems that the only ways to make spam uneconomical is to either remove idiots from the Internet (Internet Utopia here we come!), or stop the spam from getting to them.
Re: (Score:2, Funny)
Make it illegal and fine the people who profit from it.
Easier said than done. First start with a legal definition of spam e-mail, that does not cover things like mailing lists. Personally I am sending out many mass mailings, on an opt-out basis (I harvest interesting mail addresses myself) - and get very few opt-outs and many reactions. I specifically send mails to people that may be interested in buying my goods. This should definitely be legal, it's a great marketing tool and helps my business very well.
What should be illegal (and I suspect is already) are
Re:Yet another wrong answer... (Score:5, Insightful)
No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:
You are a spammer.
Re: (Score:3, Funny)
No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:
You are a spammer.
Most e-mail addresses I get are from business cards and from websites where people post their e-mail with the specific purpose to get offers of the product that I have. Some I get from other sources, but again this is from sources where the e-mail addresses are posted with the specific intent of receiving these offers.
So it is not as black-and-white as most people here try to put it. I have a mailing list containing maybe 500 addresses or so, and get on average 10-20 reactions on the offers sent, and 50-
Re: (Score:2)
Re: (Score:2)
You --- does ---> spam <--- is-a --- opt-out mailing list
^
|
|
would-be-better-off-without
|
|
World
Re:Yet another wrong answer... (Score:4, Interesting)
I can imagine the reactions you get...
There are two reasons for this. First, nobody is receiving your emails because you are blocked nine ways to hell in their spam filters. Second, because most spam (yours included) use the opt-out crap for email verification of their lists. They know they have a live one so most sane people ignore opt-out links in email since they are dangerous.
what needs to be changed *IS* the opt-out crap. It needs to be confirmed-opt-in plain and simple. While they are at it, I wouldn't say no to outlawing email harvesting either. Throw in a $10,000.00 fine for each violation of either provision and call it pretty. Make half the fine go to the organization that hunts down violators and we got a sound business solution.
Re: (Score:3, Funny)
Re: (Score:3, Insightful)
get very few opt-outs
Might this be because nobody with two neurons to rub together actually uses an opt-out link? (After all, if you're scummy enough to send me unsolicited email, you're probably scummy enough to use that "opt out" as a test to determine whether my address is real, and thus to be sold to other scum for more profit.)
You may be a nice person and run a respectable enterprise in all other respects, but if you're sending out unsolicited emails on anything more than an individual basis, you're a spammer.
Furthermore,
Re: (Score:3, Funny)
get very few opt-outs
Might this be because nobody with two neurons to rub together actually uses an opt-out link?
No, I ask them specifically to reply. Or call me - telephone number is in the mails that I send. As is my real, verifiable company name.
You may be a nice person and run a respectable enterprise in all other respects, but if you're sending out unsolicited emails on anything more than an individual basis, you're a spammer.
Which, like most people here also don't get because they can not READ and are completely pre-determined that any commercial mail == spam, is the case. E-mails are not sent out randomly, but only to addresses where there is a reasonable and real chance they are in the same business.
Re:Yet another wrong answer... (Score:4, Insightful)
Where it crosses the line and becomes spam is when it's unsolicited. That's the key. Unsolicited commercial email is the very definition of spam, and no amount of hand-waving about opt-outs or the selectivity of the lists is going to change that.
Businesses that have relied on cold-calling via any medium to drum up sales have always been sleazy in my book, but when you do it via email, you're pushing the cost out onto the recipient and onto uninvolved third parties. That's at best unethical, and at worst flat-out theft.
Re: (Score:2)
Re: (Score:2)
Don't care enough about spam to pay a tax to fund a government agency to make spam history? Then stop complaining about it like its the end of the freakin' world.
Re: (Score:2)
Re: (Score:3, Insightful)
Wake up, they are already committing fraud, and already breaking the law. The agencies already exist that fight fraud, and yet how many spammers have actually been caught and charged with fraud? How much of this spam has actually been stopped?
Re: (Score:2)
Re:Yet another wrong answer... (Score:5, Insightful)
At least once a week there seems to be another flashy technique to filter or block spam. Great.
It's not "flashy." It's called information theory and statistics. It is an extremely powerful concept that has far more important potential uses than simply filtering spam email. Every new advancement in automated classification and knowledge extraction is VITALLY IMPORTANT to our ability to cope in a world which has suddenly been flooding with SO MUCH information. This power tool is being applied to what some might see as a "silly" problem, but the fact remains that spam is a powerful motivation to researchers to push further limits in the fields of pattern recognition, information and natural language processing.
If you're against the advancement of information processing techniques, then... uh, okay, I guess. If you can't see beyond spam, you are terribly short sighted.
Re:Yet another wrong answer... (Score:4, Informative)
Re: (Score:2)
Re:Yet another wrong answer... (Score:4, Insightful)
Easy enough. Remove the customers. Set up a spam operation selling drugs. Except instead of sending what's advertised, send arsenic. Once all the customers have died, there won't be anyone left to buy spam-stuff. And, as a bonus, you help the genepool.
Simple way to Do That (Score:2)
Re: (Score:3, Funny)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That's all well and good, but wake me up when you have a viable economic solution based on the premise that spam is an economic problem. And by viable I mean doesn't have a massive downside like e-stamps, trampling on the first amendment, or elevating jail times for spammers beyond those for violent crimes.
In the mean time, you'll have to pardon me if I don't throw
Re: (Score:3, Interesting)
Re:Yet another wrong answer... (Score:5, Interesting)
Re: (Score:2)
Lots. Even in my most anceint hotmail account I see almost no spam. The filters are working and the spam cat and mouse game has reached a point where the sophistication of spam detection is outpacing the spammers. There comes a point where their resources cannot keep up. We've reached that point I think. I dont expect spam to ever leave but now its a controlled problem. In the future we might even start seeing
Re: (Score:2)
Whatever the brilliant technology he came up with is, this is just the obfuscating, fake description of it.
Re: (Score:3, Insightful)
Way to go, Captain Obvious!
This goes down in history with other sayings of similar caliber, such as
1) "Once we can stop scams from being profitable, we will finally see them go away. But no sooner."
2) "Once we can stop prostitution from being profitable, we will finally see it go away. But no sooner."
3) "Once we can stop theft from being profitable, we will finally see it go away. But no sooner."
Somehow, despite havin
Makes sense (Score:5, Informative)
Dan East
Snake oil (Score:2)
Is it a joke? (Score:3, Insightful)
Chicken-and-egg problem (Score:3, Insightful)
Re: (Score:3, Insightful)
Re: (Score:2)
This chicken and egg problem is not that hard to overcome.
Start off with "traditional" filtering techniques, they are quite accurate and I suspect give a good enough sample size to get you started.
A second option may be to ask users to mark their spam manually for a day or so. That should also be manageable.
Lastly when there is one group up and running, as I understand it new users can be added without any problems. Just keep them out of the statistical pool (only check their incoming mails on spaminess
Re: (Score:2)
However, this sounds more like a technique to augment traditional spam detection engines. Take SpamAssassin output as a precondition to classify the users, and then use that classification as an input to the SpamAssassin engine with a high weight. Tadaa! Increased detection accuracy.
Whether it would actually work or not, I dunno. Seems plausible, but only as a server based approach, such as something to augment Googl
Form letter (Score:5, Funny)
Your post advocates a
(X) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(X) Mailing lists and other legitimate email uses would be affected
(X) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
(X) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
(X) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(X) Armies of worm riddled broadband-connected Windows boxes
(X) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
(X) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
(X) Blacklists suck
(X) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(X) Countermeasures must work if phased in gradually
( ) Sending email should be free
(X) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
Re: (Score:2)
(X) Asshats
(X) Extreme stupidity on the part of people who do business with spammers
10-100x better than what? (Score:2)
Uhh. So this system makes 1 mistake in 100, and claims this is 100x fewer than some other system. Apparently this other system they are comparing against gets it wrong every single time. I guess one way to make your products look good is to compare them against the theoretical worst competitor imaginabl
Re: (Score:3, Insightful)
Re: (Score:2)
10 to 100 times more accurate than existing systems means that for every 10 to 100 mistakes that existing systems make, this system will make just one.
For instance if they say current technology is 80% accurate then out of ten thousand emails coming in, 2000 will be incorrectly classified. 100 times more accurate than that means 20 errors, or 99.8% accuracy.
Now, it happens that TFA is peddling snake oil. The top spam blocking programs make one mistake per ten thousand emails processed or 99.99% ac
Re: (Score:2)
I think the site just made a mistake in their numbers, but I found it funny.
That's 200% Accuracy! (Score:2)
They'd better have a helluva lot of revenue (Score:2)
Sidestepping the arms race (Score:2, Interesting)
This is clever: filtering spam by exploiting properties of spam pumps in general, vs. straight content analysis. The competition of ever-more-sophisticated content scanning techniques on one side, and spammers' escalating workarounds and huge botnets on the other side, is an arms race that shows no sign of abating.
Of course, this approach does still depend on something—probably content analysis—to determine which messages are spam and which are not, so that receivers' spam statistics can b
Patented Technology (Score:2)
Human! (Score:2)
Generalization of honeypots (Score:4, Insightful)
Re: (Score:2)
Re: (Score:3, Funny)
Crackpot in denial. Snake oil to sell. (Score:3, Interesting)
"he has started four companies, all based on his frustrations with existing products or services"
Unless they're all still in business that's probably 3 failures on record.
"Along the way he has amassed a personal fortune of about $230 million"
But he got out before the ship sank and with a bundle of cash too. I wonder what his ex-employees got...
"This is harder on my wife than it is on me," he said during a recent interview. "I just look at it as a problem. Here's a problem and you have four years to solve it or you don't get to solve any more problems."
How philosophical...So he's going to cure himself single handedly of a rare disease in 4 years, because medical research is as easy (and cheap) as writing software or tinkering with a home engineering project. I think he's been watching Crusade and sniffing glue.
"His perspective on his disease is also clear. Fourth on his list is "Why human beings will be extinct in 90 years." He writes, "My incurable blood cancer is minor compared to what is happening with the planet. We have somewhat more than 90 years before humanity is virtually extinct.""
Don't even know where to start on this one. I can't be bothered reading about his reasoning, but he's not the first person to predict the end of the world just beyond his own lifetime.
Oh and by the way he has a bridge, I mean some anti-spam software to sell you.
Gimme a break! Nothing to see here.
It's been around (and implemented) for years (Score:2)
This approach is quite similar to that taken by the DCC. Quoting from its home page: "The DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001."
As is often the case, those who are new to the spam problem frequently believe they are inventing something new, when it's most likely t
What about custom spams? (Score:2)
Spammers already have systems in place to randomly mutate the spam messages, to defeat systems that block spam based on identity. For example, consider Vipul's Razor, where people cooperate to flag messages as spam. Suppose a spammer sends a message with the subject "Panda Obligate Greenspan" to Joe, and Joe
Only 99% good (Score:2)
Did I miss something? (Score:2)
I've had an E-Mail address for.. well, we'll just say "forever" that's so old it was used to post on USENET before using a "real" E-Mail address was a problem. Additionally, it's also been used on some domain registrations, and in general seems to wind up on quite a few spam lists.
Using current filtering, somewhere around 80% of all E-mails this account gets is spam.
On the other hand, I'm also on a number of popular mailing lists with that E-
How insightful redundant funny spam looks... (Score:2)
Yuor psot acvotaeds a
(X) tehnccial ( ) lavsilegtie ( ) mkreat-based ( ) vgntiiale
apprcoah to fgthiing spam. Yuor ieda will not wrok. Here is why it won't work. (One or mroe of the flnwoilog may aplpy to your ptaruicalr idea, and it may have otehr flwas wihch used to vray form satte to satte bfoere a bad freeadl law was passed.)
( ) Smpreams can eislay use it to hrsevat eiaml aerdessds
(X) Milnaig ltiss and other laeititgme eaiml uess
er, Google, Anyone? (Score:2)
While I'll admit I'm ludicrously overgeneralizing his technique, and I have no real knowledge of exactly how Google identifies spam, I'd say his method smells distinctly similar to essentially what Google must be doing (broadly speaking).
If I were him, I'd be seriously researching how close his work is to The Big G, and make sure there's no conflict/overlap; or he'
The inventor responds... (Score:5, Informative)
It would be difficult for me to answer each and every comment, so I'll try to just hit the high points here.
It's quite easy to poke fun at an algorithm which is unknown to you as demonstrated by all the comments.
But what's more relevant is whether really smart people who know the algorithm can find fault with it. There are only two people outside of Abaca who know the algorithm: Stephen Wolfram (author of Mathematica) and University of Waterloo Professor Gordon V. Cormack (a well known figure in the anti-spam community). I picked Wolfram because he's the smartest pure math guy I know. I picked Cormack because I think is one of the smartest and most respected scientists in the spam field. You could contact either of them and ask them what they think of the approach. I can tell you what they'd say if you did that. They'd tell you it is a simple, elegant algorithm that has no obvious (to them) holes. I know that because the reason I disclosed it to them was to see if I overlooked anything. Neither found any holes. That doesn't prove that there aren't holes. All systems have holes. What this does mean is that a couple of pretty respected experts think it appears to be pretty solid logic.
In fact, Gordon was kind of enough to go even further and gave me permission to use the follow quote: "This is, by far, the most clever technique I'm aware of for spam filtering." Since Gordon is conference chair for a lot of spam conferences, this is a pretty significant endorsement from someone who KNOWS the full algorithm and who knows the spam space better than just about anyone.
I spent about 4 years studying what others had done in the space. As one commenter pointed out, the recipient reputation system can be thought of as a generalization of the honeypot technique that was first patented by Brightmail.
That's exactly right. My realization is that every email address has statistical value, not just honeypots. So instead of just "black" feedback, the system incorporates "grey" and "white" feedback; every recipient has an apriori odds associated with receiving mail. For many years, Brightmail was the "defacto" standard for spam filtering. Brightmail is just a special case of the algorithm I invented. So instead of learning from honeypots, we learn from ALL recipients and incorporate that statistical input in a mathematically rigorous way in order compute a statistical likelihood that our prediction was correct. That gives us much more input than a honeypot system: it gives us white, black, and grey values. That is critical to avoiding false positives because good sites (like Yahoo and Hotmail) send email to honeypots all the time. And we incorporate that feedback into a statistical framework that is much more accurate than what Brightmail used.
Exactly how we incorporate that input into spam scoring has not been publicly disclosed. It is not obvious.
People who say that this must be snake oil or cannot work ignore the fact that the system has been in use by real customer for more than a year. We have over 100 customers and are just annoucing our existence to the world, so that number should increase quite rapidly now that we are starting to market our product. There are customer testimonials on our website. You can contact them directly to verify that these quotes are legitimate.
Here are statistics from one of our rating servers. There were 1,380,140 messages since the last counter reset. 96% were rated spam. There were 176 false positives and 66 false negatives reported. I just grabbed those stats from one of our live servers right now as I was composing this message. Sometimes we're better, sometimes we're worse, but those numbers are pretty typical.
It's not perfect, but I think those are pretty good error rates for where we are now. And the stats always get better as we add more customers since we get more statistical input and this is just a statistical estimation problem. The more data, the more accurate
Re: (Score:3, Informative)
I have to say that that is the dumbest remark about software design I've ever heard. I've worked with lots of really smart people and I've seen them all miss bugs that were obvious to other people. Wolfram recently missed an error in a proof, for example.
It's more useful to have a lot of reasonably smart people look at something than have TWO (2) supposedly "really smart" people.
But, anyway, spam is a so
Re: (Score:3, Interesting)
Also, eventually, the known good address may get so much spam that it becomes a "bad" address, invalidating future good emails.
Many systems to stop spam work on small
Re:x100 improvement in accuracy? (Score:4, Informative)
Over 99 percent spam blocking means fewer than one mistake in every 100 messages processed. That's 10 to 100 times fewer mistakes than any other available systems.
Dan East
Still doesn't make sense & GP is not a troll, (Score:3, Insightful)
That still means that the best other systems make a mistake on 1 out of every 10 messages, and the worst ones make a mistake on every single message. That's still ridiculous hyperbole.
(Personally, I'll take the system that makes 100% mistakes, and I'll use the Spam folder as my Inbox.)
Now if you said that it has 1/10 to 1/100 the error rate of no
Re: (Score:3, Informative)
Re: (Score:3, Informative)
If the old way caught 95% and a new way catches 99%, the you could say it's 4.2% better (4/95) or 4 percentage points better or you could say it's gone from missing 5% to missing 1% for 80% better (4/5) or say it's 5 times better (1% missed compared with 5%). Guess which most people choose to use?
Re:KInda flawed (Score:5, Informative)
So, if I understood the article correctly, this technology will classify more email as spam the more spam you have received.
No, that's not how it works at all. Let me try putting it as a concrete example. You have a friend, Jane, who likes to swap stupid chain emails, subscribes to all kinds of "voluntary spam," and generally receives 1000 spam mails a day. Jane's a great lady, don't get me wrong, but you know the type of person I mean. You talk to her in real life, but over email she is incredibly annoying, as most of her messages are essentially meaningless.
Now, let's say that BOTH YOU AND JANE receive the same message M. Now, you know Jane, and you know the kind of messages she typically received (mindless, at least in YOUR eyes). What are the chances that this message M is something that YOU will be interested in? Probably very low. The vast majority of email Jane receives is "crap," at least according to your definition, and so the very fact that Jane received message M greatly increases the likelihood that it is "crap."
Does that make better sense?
Re: (Score:2, Insightful)
Not much.
Two issues: First, how does the system know that Jane's e-mail is mostly spam. Who tells it? Does it use some other filters to identify the spam in order to determine her spam rate?
Second, how does the system know that the message you received and the message Jane received are the same? Spammers have long been randomizing parts of messages in order to block older spam filters.
Re: (Score:3, Interesting)
Second, how does the system know that the message you received and the message Jane received are the same? Spammers have long been randomizing parts of messages in order to block older spam filters.
An interesting thing, as outlined in TFA that you should R, is that the mails do not have to be the same. They may have different check-sums even. However they are checked against the sending IP-address. If more messages from the same IP address arrive (presumably within a certain time frame), they are all considered spam or ham. Spammers tend to send lots of mails from the same IP address at a time, so that should work.
How they handle mailing lists though is not clear to me really. There are quite some
Re:Kinda flawed (Score:3, Interesting)
If the contents are irrelevant, then how does this system determine that any two messages are the same? And your answer, "by the sender IP" (and unspoken, by a similar send time).
Which then leads me to ask - what about mail relays, where the same IP address sends thousands of emails every day? Wouldn't every email sent by the relay at ro
Re: (Score:3, Informative)
Only the mail relay IP address can be determined unambiguously - that's the host which is connecting to the host which is checking the mail for spamminess.