More on Bayesian Spam Filtering 251
michaeld writes "The "Bayesian" techniques for spam filtering recently publicized in Paul Graham's essay A Plan for Spam doesn't actually seem to have anything Bayesian about it, according to Gary Robinson (an expert on collaborative filtering). It is based on a non-Bayesian probabilistic approach. It works well enough, because it is frequently the case that technology doesn't have to be 100% perfect in order to do something that really needs to be done. The problem interested Robinson, and he posted his thoughts about trying to fix the problems in the Graham approach, including adding an actual Bayesian element to the calculations."
Tutorial on Bayesian Inference (Score:5, Informative)
The timing of this article seems impecable, since I am myself trying to learn about Bayesian Statistics.
I am a Computer Science student [ime.usp.br] studying Computational Biology [ime.usp.br] (more specifically, Sequence Alignments) and while I have a bit of background on Classical Statistics, I was (and still am) completely ignorant about Bayesian Statistics.
It is only now that I'm trying to learn about Hidden Markov Models and its applications to Sequence Alignment that Ifinally decided to learn the basic hypothesis about Bayesian Statistics and how it differs from the hypothesis made by the Classical Statistics.
During my searches for finding introductory material on Bayesian Statistics, I found this course page [arizona.edu] which has some nice introductory notes, including Bayesian Statistics.
I hope that other people find this resource as useful as I did.
Terrible Spam Filters (Score:3, Informative)
It's funny how bad the standard Microsoft spam filter is (the one present in outlook). It's simply a word lookup, where if the word is present the message is marked as spam. It looks for things like "for free?". You can see the full list here [iirusa.com], near the bottom. It's a little old, but not outdated (I think you can upgrade your spam filters, but I tested these, and the ones I tested work).
The adult filter isn't any better.
Well... (Score:2, Informative)
Anyway I hear that the next version of MSN will have a Bayesian filter and that it will be introduced in an up coming version of Outlook Express (no idea about Exchange and Outlook).
BTW I believe internally MS uses this technique for spam control and that they don't seem to have any spam problems.
Re:I still think passive euthanasia is the best wa (Score:3, Informative)
Perhaps the problem is that the law would gain them less votes then a few hundred thousand dollars in campaing financing would. A large portion of the population isn't online, and a large portion of those who are don't care about spam, so your politician doesn't care either.
Since this is such a trivial technical problem to solve, it's not really a big deal either way. I daily reduce 800 spam messages to five or six that make it through to my inbox just using procmail scoring, and I haven't had a false positive in years. I spend five minutes updating my procmailsc every six months to keep it effective. I suppose that I could use an automated system to generate my score file similar to what Paul Graham described, but when I only spend ten minutes a year updating my rules, it's going to be alot of years before it was faster to have written all that code. No need for sweeping legislation.
Re:How do you pronounce "Bayesian" anyways? (Score:2, Informative)
Re:Why just spam? (Score:3, Informative)
Re:Why just spam? (Score:2, Informative)
microsofts trademark (Score:3, Informative)
Re:microsofts trademark (Score:3, Informative)