Forgot your password?
typodupeerror
Spam

More on Bayesian Spam Filtering 251

Posted by michael
from the snake-eyes dept.
michaeld writes "The "Bayesian" techniques for spam filtering recently publicized in Paul Graham's essay A Plan for Spam doesn't actually seem to have anything Bayesian about it, according to Gary Robinson (an expert on collaborative filtering). It is based on a non-Bayesian probabilistic approach. It works well enough, because it is frequently the case that technology doesn't have to be 100% perfect in order to do something that really needs to be done. The problem interested Robinson, and he posted his thoughts about trying to fix the problems in the Graham approach, including adding an actual Bayesian element to the calculations."
This discussion has been archived. No new comments can be posted.

More on Bayesian Spam Filtering

Comments Filter:
  • by Anonymous Coward on Tuesday September 17, 2002 @04:27PM (#4276105)
    kill 'em. might = right
  • by saskboy (600063) on Tuesday September 17, 2002 @04:41PM (#4276225) Homepage Journal
    I have some tricks for Hotmail users who cannot benefit from the technique above:
    Filter any message without the @ in the address.
    Filter Britney, Boobs, Penis, Inches, WIN, ___ ..... and your own email address userid.
    Now you only have about 40 spams a day to deal with instead of 100.
    Uncheck your information from being in the MSN directory too.

    Enjoy :-)
    John
  • Let's see (Score:5, Funny)

    by sam_handelman (519767) <skh2003@@@columbia...edu> on Tuesday September 17, 2002 @04:49PM (#4276287) Homepage Journal
    P (This is spam) = P (This is Spam | It will enlarge my penis) * P (It will enlarge my penis)

    Now, given that I have prior knowledge that:
    P (It will enlarge my penis)

    is very low,

    and given that, having never encountered anything which enlarges my penis in any permanent way, I have no knowledge of
    P (This is Spam | It will enlarge my penis)

    and we have the product of one probability which I know is low, and another of which I have no posterior knowledge, so we conclude that P (It is Spam) is also low, and that I must have requested more information on their new penile enlargement technique.

    So, that message goes into the keepers.

    Meanwhile,

    P (It is Spam) = P (It is Spam | Frank is getting maried) * P (Frank is getting married)

    So, I know frank is getting married, since he sent me this e-mail I'm considering filtering as Spam, and weather or not it is spam is pretty much independent of whether or not frank is getting married, so.... it's Spam. Away it goes.

    P.S. I've deliberated made a hash of this for a joke. The actual rule is:

    P (A & B) = P (A | B) * P (B)
  • by operagost (62405) on Tuesday September 17, 2002 @05:01PM (#4276397) Homepage Journal
    Note to statisticians: the product of the probabilities is monotonic with the Fisher inverse chi-square combined probability technique from meta-analysis. The null hypothesis is that the probabilities are independent and uniformly distributed.
    Ouch! My brain is hurting, Doc!

"Marriage is low down, but you spend the rest of your life paying for it." -- Baskins

Working...