Slashdot is powered by your submissions, so send in your scoop

Bayesian Filtering For Dummies 281

Posted by timothy on Monday May 26, 2003 @05:27PM from the to-increase-the-peace dept.

Dynamoo writes "Bayesian filtering for spam is awfully clever stuff, touched on by Slashdot several times before. There's a very accessible article at BBC News explaining in fairly simple terms the drawbacks of current keyword-based filtering. It's slightly ironic that the BBC, through the commissioning of Monty Python, also gave 'spam' its name. Those Vikings have a lot to answer for."

This discussion has been archived. No new comments can be posted.

Bayesian Filtering For Dummies

Load All Comments

Search 281 Comments Log In/Create an Account

Comments Filter:

Yes, we must filter out the dummies (Score:5, Funny)

by Anonymous Coward writes: on Monday May 26, 2003 @05:30PM (#6042079)

I suggest Slashdot immediatly implement this "Bayesian Filter for Dummies" to remove most of the trolls, etc.

Share
twitter facebook
- Re:Yes, we must filter out the dummies (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  Wiping out all of the comments just because of the trolls seems a bit extreme.
- Re:Yes, we must filter out the dummies (Score:5, Interesting)
  
  by zoikes ( 182347 ) writes: on Monday May 26, 2003 @05:42PM (#6042165)
  
  The moderation system (esp. in its current form - moderation by +karma /.ers) will always be better than automated filtering.
  
  The key problem is adaptation. "Bayesian filtering is better than simple keyword filtering, but its performance will degrade over time unless its rules are continuously updated (via analysis of new data). And there's the problem that a troll in one story context may be an insightful comment in another.
  
  Moderation by humans apapts rapidly, accomodates a variety of contexts, and will reflect (and grow with) the overall /. "culture".
  
  Parent Share
  twitter facebook
  - Re:Yes, we must filter out the dummies (Score:2, Insightful)
    
    by Drakin ( 415182 ) writes:
    
    Unfortanatly there's also the problem with some uneducated people with mod points who can't tell the differnce between a truely insightful post and one that is a well written troll. Nor, the people who confuse a troll with humor that's on topic in terms of a given discussion.
    
    So while it works, there's still some holes in the system.
  - Human filtering (Score:3, Funny)
    
    by stile ( 54877 ) writes:
    
    Great. Wanna filter my email for me? ;)
- Re:Yes, we must filter out the dummies (Score:5, Interesting)
  
  by dJCL ( 183345 ) writes: on Monday May 26, 2003 @05:47PM (#6042194) Homepage
  
  I've been using a baysian spam filter for months now and I understand how they work... Even thou people find the comment funny, a baysian troll filter on slashdot would work...
  
  If you were to run every slashdot post throu my mail filter as an e-mail message and properly mark the trolls and others you don't want, and the ones you do want, suddenly you would only get the actual good posts, trolling would die quickly... And because of the user classification system currently in place, slashdot has a huge db to build up the word stats, so it could happen immediatly or faster...
  
  Seriously, I ask that the slashdot admins consider adding this to slashcode... even if slashdot does not use it, others would... there are too many trolls out there as it is on the net and many people put them only a few rungs higher than spammers on the evolutionary ladder(but lower than an ameoba still)
  
  The logic behind this can actually be extended, to allow a user to start filtering stories so that they only get ones that interest them, or even to filtering submissions to get rid of the cruft, how often to you think that the trolls post troll story submissions? Save work for the site admins...
  
  I'm curious if an extension of this idea is how Google News works... anyone know?
  
  Enjoy.
  
  Parent Share
  twitter facebook
  - Re:Yes, we must filter out the dummies (Score:3, Funny)
    
    by dark-br ( 473115 ) writes:
    
    Would it work for editors too? If so *please* implement it!
  - Re:Yes, we must filter out the dummies (Score:2)
    
    by jericho4.0 ( 565125 ) writes:
    
    A baysian troll filter would not work, for the simple fact that an email is spam or not spam, but a post is many more things to most people.
  - Re:Yes, we must filter out the dummies (Score:5, Insightful)
    
    by bluelan ( 534976 ) writes: on Monday May 26, 2003 @07:46PM (#6042863)
    
    This wouldn't work.
    Baysian filters for spam work because spam has a significantly different vocabulary distribution than useful e-mail. This is true because spam must deliver a commercial message and play on people's uncertainties.
    Good trolls, on the other hand, look ALMOST like insightful, well written articles. The vocabulary distribution in good trolls is not significantly different than the vocabulary distribution of useful posts. So, Baysian filters would be useless, unless you come up with some smarter characteristics on which to train the filter.
    You could easily develop a filter for ascii-art porno. But, those are offtopic or flaimbait, not trolls.
    
    Parent Share
    twitter facebook
    - Re:Yes, we must filter out the dummies (Score:3, Interesting)
      
      by bluGill ( 862 ) writes:
      
      Ahh, but a troll that looks genuine at first, and appears on topic is worth a reading for the laugh. It needs to be marked funny, and depending on how good it is might need some explination in a followup post to keep those not in the know from thinking the wrong thing.
      OTOH, first post is always useless and a waste of time. So are a few other posts. ASCI-art might be easy to filter, but can you filter the porn ascii-art without blocking the guy trying to make a diagram of some sort so we can better unde
  - Re:Yes, we must filter out the dummies (Score:5, Interesting)
    
    by DeadSea ( 69598 ) * writes: on Monday May 26, 2003 @08:22PM (#6043099) Homepage Journal
    
    Bayesian filters for email really only work because spammers can't see which messages you classify as spam. If you implemented a bayesian filter for trolls on slashdot, the trolls would see what words constitute a troll and stop using those words. They would stuff their messages with non-troll words avoiding the bayesian filter.
    The same thing would happen to your mail if the words that your bayesian filter were the same as the words in everybody else's. Spammers would be able to see what make an email seem spamming and they wouldn't do that. Bayesian filtering works for email right now because everybody's filters are a bit different. There is currently no magic bullet to get through everybody's spam filters. Also spammers cannot see your filter so they don't know if their message was filtered. If you opened your archive to me, I could quite easily craft a spam that would land square in your inbox.
    
    Parent Share
    twitter facebook
- Re:Yes, we must filter out the dummies (Score:3, Redundant)
  
  by Fembot ( 442827 ) writes:
  
  Actualy you might joke about this, but I think that having an optional bonus given (As with the current karma bonus) for messages that arent likely to be classified as Troll, offtopic or reduntant might be quite helpful for casual readers.
  
  It would work based on the moderations of comments for the learning, and of course the bonus would be totaly userdefinable so you could set it to 0 and it would have no effect.
  
  Just my thoughts on the issue
  - - Re:Yes, we must filter out the dummies (Score:2)
      
      by Fembot ( 442827 ) writes:
      
      yeah but a) its bayesian so learns from moderations
      and
      b) its optional, so rather than not letting you post at all it lets you post, but some users get the post flagged lower.
- Re:Yes, we must filter out the dummies (Score:3, Funny)
  
  by milkmandan9 ( 190569 ) writes:
  
  I suggest Slashdot immediatly implement this "Bayesian Filter for Dummies" to remove most of the trolls, etc.
  So, tell me...would anyone be left?
  
  Didn't think so.
A bit of info on Bayesian filtering (Score:5, Informative)

by jat850 ( 589750 ) writes: on Monday May 26, 2003 @05:32PM (#6042093)

The BBC article mentions Paul Graham, and I found his page (and some more information on Bayesian networks for spam filtering) here:

Paul Graham's spam page [paulgraham.com]

He talks a little bit more about the technical aspects there.

Share
twitter facebook
- Re:A bit of info on Bayesian filtering (Score:3, Informative)
  
  by Rosco P. Coltrane ( 209368 ) writes:
  
  From Paul Graham's page :
  
  A probability can of course be mistaken, but there is little ambiguity about what it means, or how evidence should be combined to calculate it. Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.
  
  Tough filter for users of dating
  - Re:A bit of info on Bayesian filtering (Score:5, Insightful)
    
    by letxa2000 ( 215841 ) writes: on Monday May 26, 2003 @06:23PM (#6042385)
    
    A gynecologist probably wouldn't have a corpus that indicates that "sex" is a .97 spam probability. That's the great thing about Bayesian: the spam probability for each word depends on the mail and spam YOU receive. It works dang well, just as Paul Graham claims. I'm averaging 99.7% accuracy this week, and the one spam that got through was written in German.
    
    Parent Share
    twitter facebook
  - Re:A bit of info on Bayesian filtering (Score:5, Insightful)
    
    by GnuVince ( 623231 ) writes: on Monday May 26, 2003 @06:32PM (#6042448)
    
    No, because if they have a lot of legitimate mails with words like "sex", "sexy", "penis", "vagina", "viagra", etc., the filter will adapt. That's the whole point. For PG, "sexy" is a sure sign of spam, but for a sexologist, it is not. You train the filter to recognize your spam. So if "sex" appears as much in your legitimate mail than in your spam, "sex" will not be considered a trace of a spam.
    Bayesian filters adapt, that's why they work so well.
    
    Parent Share
    twitter facebook
- Brief Tech Notes on Bayesian Filtering (Score:5, Informative)
  
  by robbyjo ( 315601 ) writes: on Monday May 26, 2003 @06:06PM (#6042298) Homepage
  
  Well, the type of Bayesian learning used in this spam filtering is called "Naive Bayesian" and the engine is trained using "supervised learning" technique. Naive Bayes has been proven very successful for text categorization. Spam filtering is even more successful because we essentially categorize e-mails to two labels: "spam" or "not spam".
  
  Supervised learning basically works like this. Feed the engine with multiple examples (in this case, e-mails) with labels (in this case, "spam" or "not spam"). The training usually takes thousands of examples to get good enough accuracy. And take note that we need both "spam" and "not spam" examples to enable the learning engine to distinguish them.
  
  How Naive Bayes works? Well, think of the full Bayesian Network. Bayes net is basically a causal-effect graph with annotated Conditional Probability Table (CPT) on each node denoting the probabilities of possible values. Full Bayes Net takes Directed Acyclic Graph (DAG), but Naive Bayes takes a form of tree instead due to some "naive" assumptions. (Okay, I handwaved a whole lot of details here) And in Learning Naive Bayes, we basically try to construct the tree out of the examples.
  
  Let P(spam) be the percentage of training e-mails that is labelled as "spam" and P(not spam) be the percentage of "not spam" e-mails.
  
  First, let the filter reads all e-mails and collect the words out of them. Weed out duplicates and stop words (common words like "I", "you", "the", etc). Let NumVocab be the number of words after weeding.
  
  Second, process e-mail one by one. Do weeding phase like the above. Let "n" be the number of words on that particular e-mail after the weeding. Scan the word one by one. Let "w" be the current word scanned and "nw" be the number of times word "w" occur in that e-mail. Imagine you have a big two dimensional array to store the result (let's call the array "P"). If the e-mail is labeled "spam", then store (nw+1)/(n+NumVocab) to P[w][spam].
  
  Repeat until all training e-mails are read.
  
  And here comes the testing phase...
  
  When you encounter an e-mail and want to classify whether it's spam or not, you'll need to look up the array P you created earlier. First, you do the weeding phase and scan the word one by one. The algo is like this:
  
  pspam = P(spam); pnospam = P(not spam); foreach unique words w in e-mail do pspam = pspam * P[w][spam]; pnospam = pnospam * P[w][nospam]; endfor if (pspam > pnospam) then return IS_SPAM; else return IS_NO_SPAM;
  
  Hope this helps.
  
  Parent Share
  twitter facebook
  - Re:Brief Tech Notes on Bayesian Filtering (Score:2, Insightful)
    
    by DuSTman31 ( 578936 ) writes:
    
    Spam filtering is even more successful because we essentially categorize e-mails to two labels: "spam" or "not spam"
    
    True. You could simply have a spam and a not spam category. I don't think that'll necessarily lead to the highest accuracies though.
    Spam naturally seems to come in several categories - porn, penis enlargements, mortgages etc. However, it's unlikely that any one spam will simultaneously advertise porn and mortgages. Simply having a "spam" and a "not" category will not take advantage of
    - Re:Brief Tech Notes on Bayesian Filtering (Score:4, Insightful)
      
      by Ian Bicking ( 980 ) writes: <ianb&colorstudy,com> on Monday May 26, 2003 @08:04PM (#6042970) Homepage
      
      Spam naturally seems to come in several categories - porn, penis enlargements, mortgages etc. However, it's unlikely that any one spam will simultaneously advertise porn and mortgages. Simply having a "spam" and a "not" category will not take advantage of distinctions such as that.
      Why does it matter what category? To the user they don't care what kind of spam, merely that it's spam. And this isn't just a UI issue -- the filter is not meant to indicate authoritatively what is spam and what is not. Instead it learns what the particular user considers spam. You're only going to introduce inaccuracy if you create more categories, because the user is sometimes going to miscategorize spams (e.g., porn in penis enlargement). The user is not invested in the result of that subcategorization, so it's not a good goal for training.
      Certainly there are other categorizations that are useful, e.g., work vs. private mail. Bayesian techniques can be used for further categorization, but they should only be used to categorize as far as the user cares to have their mail categorized.
      Bayesian techniques for non-spam wouldn't be that useful, anyway, because non-statistical rules generally work well for everything but spam -- it's only because spammers are specifically trying to defeat non-statistical rules that we need statistical analysis. The only other place for Bayesian techniques, IMHO, is where the user can't articulate the basis of the categorization they desire (but that's probably quite common).
      
      Parent Share
      twitter facebook
  - Re:Brief Tech Notes on Bayesian Filtering (Score:2)
    
    by dave_mcmillen ( 250780 ) writes:
    
    Supervised learning basically works like this. Feed the engine with multiple examples (in this case, e-mails) with labels (in this case, "spam" or "not spam"). The training usually takes thousands of examples to get good enough accuracy. And take note that we need both "spam" and "not spam" examples to enable the learning engine to distinguish them.
    
    Thank you for the additional detail! What I wonder is this: would it be useful for people to be able to somehow pool their examples? The number of spam messa
    - Re:Brief Tech Notes on Bayesian Filtering (Score:2, Insightful)
      
      by nackrm ( 571581 ) writes:
      
      Pooling spam to teach isn't such a good idea. The problem you might run into is that some people, like say a plastic surgeon, might get many emails that have words like penis, vagina, sex, larger, etc. So their filter info might allow some spam to get through. This is also the reason that mozilla's mail client wouldn't be "pretrained" for you. Instead the email probably had some key qualities to it that were dead givaways to being spam. One of those is the really long strings of characters used by spam
      - Re:Brief Tech Notes on Bayesian Filtering (Score:2)
        
        by Blkdeath ( 530393 ) writes:
        
        Pooling spam to teach isn't such a good idea. The problem you might run into is that some people, like say a plastic surgeon, might get many emails that have words like penis, vagina, sex, larger, etc.
        
        While I do agree with you, for the most part, it would be plausible to include such e-mails as "wet horny teen sluts want to cum for you" et al.
        I have to say that I did notice Mozilla picked up some mail as SPAM right out of the gate. The unfortunate part was; it picked up several false positives. It was
        
        You asked for it ;) (Score:2)
        
        by brad-x ( 566807 ) writes:
        
        How to Increase Your Penis
        
        And Stop Premature Ejaculation
        
        FREE Bottle Offer 100% Guaranteed to work.
        
        Take Advantage of Our FREE Bottle Offer As Seen On TV !!!
        
        Click here to learn more. [slashdot.org]
        
        NB: Amusingly my first revision of this was smacked down by slashdot's inbuilt junk filtering mechanisms. :P
  - DON'T implement it like the parent (Score:3, Informative)
    
    by NoOneInParticular ( 221808 ) writes:
    
    If you do it like the parent:
    
    pspam = P(spam); pnospam = P(not spam); foreach unique words w in e-mail do pspam = pspam * P[w][spam]; pnospam = pnospam * P[w][nospam]; endfor if (pspam > pnospam) then return IS_SPAM; else return IS_NO_SPAM;
    
    You'll soon be running out of bits to store the floating point results. Implement it by adding logarithms of probabilities instead of products of them, thus:
    lpspam = log(P(spam)); lpnospam = log(P(not spam)); foreach unique words w in e-mail do lpspam = lpspam + log(
It's not bad... (Score:3, Interesting)

by Sheetrock ( 152993 ) writes: on Monday May 26, 2003 @05:32PM (#6042094) Homepage Journal

I've been using it for a bit on my own e-mail, and it seems to work out. But it's not at the point where I'd be happy to see ISPs implementing it for their customers -- even ignoring the Freedom of Speech issue, it still has the occasional false positive.

Share
twitter facebook
- Re:It's not bad... (Score:2, Insightful)
  
  by letxa2000 ( 215841 ) writes:
  
  The question is, which produces more false positives: The occasional Bayesian false positive, or the occasional (or not so occasional) good mail that you'll accidentally delete when you're deleting 150 spams per day? If I'm getting 150 spams per day that's 1050 spams per week which is an awful lot of "deletes." You don't think you're going to accidentally throw out a good message now and then when manually deleting that much spam? I'd venture to say that you'll probably accidentally delete more yourself b
- Re:It's not bad... (Score:2)
  
  by Nogami_Saeko ( 466595 ) writes:
  
  Likewise, I've also been using it via POPFile [sourceforge.net]. I'm _extremely_ happy with the results. Within a week of implementing it, my incoming spam was cut by over 90% and the classification error rate is exceptionally low.
  
  I've been running POPFile since January, and on over 1,200 messages (until my HD blew up), the accuracy was over 98%.
  
  Good deal!
  
  N.
- - Re:It's not bad... (Score:3, Informative)
    
    by pyr0 ( 120990 ) writes:
    
    That's exactly how spamassassin works. My school has it set up on their exchange servers. The problem with that is...you still get the spam. So even if it does go in another folder, which in turn still wastes your bandwidth and time in deleting it, as far as the spammer knows it went through. The still gets paid for having sent out a successful spam. The only real advantage is being able to read your legit messages without sorting through the spam, and even that is starting to become an issue because i
Origin of SPAM (Score:3, Interesting)

by brejc8 ( 223089 ) writes: on Monday May 26, 2003 @05:32PM (#6042095) Homepage Journal

It's slightly ironic that the BBC, through the commissioning of Monty Python, also gave 'spam' its name.
Does anyone have proof thats where the name comes from?

Share
twitter facebook
- Re:Origin of SPAM (Score:5, Informative)
  
  by jat850 ( 589750 ) writes: on Monday May 26, 2003 @05:36PM (#6042123)
  
  Good question ... through Google Groups I found this page [templetons.com].
  
  Parent Share
  twitter facebook
- Re:Origin of SPAM (Score:5, Informative)
  
  by Anne Thwacks ( 531696 ) writes: on Monday May 26, 2003 @06:26PM (#6042410)
  
  While the Monty Python sketch may have inspired the use of the term, the Monty Python usage was in fact a rehash of a sketch by Peter Sellers, dating back to the 1950's which referred to the wartime situation where Cafe's often had fancy things on the Menu, but when you came to order, the item in question was not available.
  The sketch is to be found on the album "The Bset of Sellers" - probably released in about 1958, and which also features the nursery rhyme
  "Up on the chair behind the door,
  hey diddle, diddle,
  Hear comes Poppa
  so up with the chopper
  and split 'im down the middle
  And "Balham, gateway to the South" a spoof of the travalogue films that often apepared in the cenema at the time.
  
  Parent Share
  twitter facebook
Dialect or typo? (Score:3, Funny)

by isomeme ( 177414 ) writes: <cdberry@gmail.com> on Monday May 26, 2003 @05:32PM (#6042101) Journal

From the article's subhead:

just as paper junk mail buried many a front door map

Is that yet another weird British idiom, or simply a typo for "mat"?

Share
twitter facebook
Vikings? (Score:2, Funny)

by cperciva ( 102828 ) writes:

I'd say that the BBC has more in common with the Normans, actually.
- Re:Vikings? (Score:2)
  
  by tupps ( 43964 ) writes:
  
  The viking reference is to the Vikings that were in the Monty Python Sketch. From memory there were 2 American tourists trying to order something and Vikings kept popping up singing the Spam song!
- - Re:The Normans WERE Vikings (was: Vikings?) (Score:2)
    
    by cyril3 ( 522783 ) writes:
    
    The Normans were Danes about as much as the Americans are English.
More Spam! (Score:3, Insightful)

by James Littiebrant ( 622596 ) writes: on Monday May 26, 2003 @05:33PM (#6042104)

I have used a bayesian filter for some time now and while it is the BEST filter type I have ever used nothing is 100% reliable. While this is the best technology for the average user it is most cirtainly not perfect. Instead I use a combination of moderate bayesian filtering and good old fasion "block sender" filtering.

Share
twitter facebook
Speaking of dummies... (Score:5, Informative)

by Anonymous Coward writes: on Monday May 26, 2003 @05:34PM (#6042107)

Someone needs to learn the meaning of "ironic". (Hint: it doesn't mean "weird coincidence".)

Paul

Share
twitter facebook
- "Alanis irony" (Score:3, Funny)
  
  by Danny Rathjens ( 8471 ) writes:
  
  That's what we refer to as "Alanis irony", 8^)
  - Re:"Alanis irony" (Score:3, Interesting)
    
    by joeytsai ( 49613 ) writes:
    
    Actually it is ironic when you write a song called "ironic" and there are no ironies in it.
- Re:Speaking of dummies... (Score:2, Funny)
  
  by MechCow ( 561875 ) writes:
  
  I thought words were defined by how they are used. It would be ironic if you work for webster.
- Re:Speaking of dummies... (Score:2)
  
  by happystink ( 204158 ) writes:
  
  Right, the example given isn't irony at all. A better example would be a thousand spoons when all you want is a fork.
Ironic? (Score:5, Funny)

by popeydotcom ( 114724 ) writes: on Monday May 26, 2003 @05:36PM (#6042127) Homepage

Interesting yes, ironic, no.

What's your name, Alanis Morissette ?

Share
twitter facebook
- Re:Ironic? (Score:2)
  
  by JahToasted ( 517101 ) writes:
  
  That song was ironic if you think about it. You have a song called "Ironic" that doesn't contain one example of irony. That in itself is irony...
- Re:Ironic? (Score:4, Insightful)
  
  by DavyByrne ( 30170 ) writes: on Monday May 26, 2003 @07:18PM (#6042724) Homepage
  
  Actually, I've long wondered whether Alanis was quite clever in choosing a title for that song.
  
  You see, none of the events she describes in the song is an example of irony, making the choice of the title "Ironic," well, ironic.
  
  Parent Share
  twitter facebook
Reminds me of a story (Score:3, Funny)

by joelt49 ( 637701 ) writes: <joelt49@yaSLACKWAREhoo.com minus distro> on Monday May 26, 2003 @05:37PM (#6042132) Homepage

This whole spam thing reminds me of a story I read while in 7th grade. In it, the postage for sending junk mail was decreased to practically nothing. Then, junk mail buried America. Hundreds of years later, archeologists came back and investigated the remains. Their conclusions about our society are kind of humorous. However, the idea of junk mail burying us when the postage goes way down has kind of been proved with spam. Maybe a small tax for spam wouldn't be a bad idea.

Share
twitter facebook
- 0.0001% response rates (Score:3, Insightful)
  
  by rippie78 ( 661796 ) writes:
  
  The sheer number of spam mail sent means that even tiny response rates, reportedly 0.0001%, means junk mailers turn a profit.
  Are we missing a critical factor of the end user who actually responds to SPAM?
  If spammers survive on 0.0001% response rate, how many people are actually clicking/buying? Are these people who provide the customers for spammers going to stop or use any sort of filters?
No no no, Bayesian Filtering *OF* Dummies please (Score:4, Funny)

by corebreech ( 469871 ) writes: on Monday May 26, 2003 @05:38PM (#6042142) Journal

Thank you for your support.

Share
twitter facebook
Do spammer's techniques work on slashdot ? (Score:5, Funny)

by Rosco P. Coltrane ( 209368 ) writes: on Monday May 26, 2003 @05:38PM (#6042143)

Viagra often spelled V-l-a-g-r-a online

I-f I t-r-o-l-l l-i-k-e t-h-i-s, w-i-l-l i-t p-a-s-s S-l-a-s-h-d-o-t.'s t-r-o-l-l f-i-l-t-e-r ?

Share
twitter facebook
- Re:Do spammer's techniques work on slashdot ? (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  1. refresh slashdot page once a minute
  2. wait 15 seconds
  3. ???
  f-r-i-s-t p-o-s-t!!!
  
  i-m-a-g-i-n-e a b-e-o-w-u-l-f c-l-u-s-t-e-r o-f B-a-y-e-s-i-a-n F-i-l-t-e-r-s!
  - Re:Do spammer's techniques work on slashdot ? (Score:4, Funny)
    
    by MyHair ( 589485 ) writes: on Monday May 26, 2003 @06:27PM (#6042412) Journal
    
    S-t-e-p-h-e-n K-i-n-g i-s d-e-a-d a-t 5-2 !
    
    B-S-D i-s d-y-i-n-g ! N-e-t-c-r-a-f-t c-o-n-f-i-r-m-s i-t !
    
    Parent Share
    twitter facebook
Wrong pic... (Score:4, Informative)

by Mondoz ( 672060 ) writes: on Monday May 26, 2003 @05:39PM (#6042150)

It's slightly ironic that the BBC, through the commissioning of Monty Python, also gave 'spam' its name.
Why then, does the article show a pic from a Monty Python animation about the black spot who goes to seek his fortune...
You'd think they'd use the actual pic of the skit with the Vikings in the cafe...

Share
twitter facebook
Hmmm (Score:2, Insightful)

by Anonymous Coward writes:

So this filter works on analysis of previously filtered mail?

I can see the casual (mis)use of this technique by your average user rapidly becoming a problem - putting just one email from a legit e-mail sender into the bayesian filter could concievably snowball into a block on a lot of legit traffic under certain circumstances.

Above and Below knows I have enough hassle with users and their e-mail already
- Re:Hmmm (Score:3, Insightful)
  
  by letxa2000 ( 215841 ) writes:
  
  I can see the casual (mis)use of this technique by your average user rapidly becoming a problem - putting just one email from a legit e-mail sender into the bayesian filter could concievably snowball into a block on a lot of legit traffic under certain circumstances.
  It's natural to think that is the case, but in reality it isn't. Accidentally putting one email in the wrong corpus ("good" or "spam") will not be enough to kill you. If you consistently fail to put them in the right corpus then over time,
Required Reading by E-mail Users (Score:4, Interesting)

by Shackleford ( 623553 ) writes: on Monday May 26, 2003 @05:46PM (#6042189) Journal

This "Bayesian Filtering for Dummies" article, titled "How to spot and stop spam" on the BBC web site, gave much useful information on the problem of spam and the filtering method used to get around it. It is quite comprehensible, as you certainly don't need to know the probability theory behind Bayesian filtering to understand it. It gives useful information on the problem of spam, and I'd say that this sort of article is required reading for all those who use e-mail. Why? Becaus it states this fact:
"The sheer number of spam mail sent means that even tiny response rates, reportedly 0.0001%, means junk mailers turn a profit. "
And this is why I say that educating users is just about as important as implementing spam filtering technology. If people know that they are perpetuating a serious problem by replying to spam, then that's bad news for spammers.
About another fact mentioned in the article: It said Paul Graham's filter extracts "the top 15 features that define them as spam." 15? I thought that most Bayesian filters use many more spam-defining features. Because I'd say that there are quite a few more. Just think of the many features that spam tends to have. But he says his filter works well. Interesting.

Share
twitter facebook
- Re:Required Reading by E-mail Users (Score:3, Insightful)
  
  by dJCL ( 183345 ) writes:
  
  From my understanding of his full explanation(I read it a while ago, can't remember where, dig around some) each e-mail has every word examined and given a rating from 0.01(good) to 0.99(spam), then the 15 words farthest from 0.50 are selected, some averaging is done and if the score is over some threshold(say 0.90) then it is called spam and trashed, I use spamunition for my outlook e-mail(working on moving my e-mail over to linux, hopefully soon, so I can del my windows boxen) and it can give the stats fo
  - Re:Required Reading by E-mail Users (Score:3, Interesting)
    
    by kindbud ( 90044 ) writes:
    
    I have 5200 spam e-mails saved and about 1000 legit mail saved and my accuracy level is about 99.9...
    
    Yes, but you haven't reduced your exposure to spam. In fact, it looks like now you have to track your spam intake assiduously so as to keep the filter trained. Not many people would consider this an improvement. :)
- Re:Required Reading by E-mail Users (Score:2)
  
  by WolfWithoutAClause ( 162946 ) writes:
  
  And this is why I say that educating users is just about as important as implementing spam filtering technology. If people know that they are perpetuating a serious problem by replying to spam, then that's bad news for spammers.
  The reason that spam works so well is that the proportion of idiots out there who either didn't attend the lesson, didn't believe the lesson, or didn't listen in the lesson is small, but significant. There's one born every minute as they say.
  About another fact mentioned in the ar
I don't receive spam (Score:5, Interesting)

by Rosco P. Coltrane ( 209368 ) writes: on Monday May 26, 2003 @05:48PM (#6042201)

In my home mailbox, I don't receive spam. And I only got two 419 nigerian invesment frauds on my professional address in a whole year, despite the fact that my corporate email address is widly publicized and easy to find on google. And amazingly, I never receive spam in my "special bogus registration" hotmail account (useful for programs like RealPlayer, or nytimes.com).

So existing mail filters work for me, more or less. The few unwanted mails that pass through are easily taken care of by my trusted delete button. This leads me to ask :

- Do other people really receive that much spam, or am I an isolated case ?

- Do people who receive spam purchase things online, or register software and other services with their real names and email ?

Share
twitter facebook
- Re:I don't receive spam (Score:2)
  
  by dmeranda ( 120061 ) writes:
  
  Maybe how you count, but most of us don't need v1agra or naked ch33r1eaders, and therefore consider all those messages to be spam! .. actually, on second thought ;-)
  
  The obvious question is how much mail do you receive in total, how much non-spam, and how many false-positives go completely unnoticed by you? I've had my email account since the late 1980's and I get over 200 per day. I also run a mail gateway for a medium sized company, and we get over 30,000 per day.
  There are in fact two big problems wi
  - Re:I don't receive spam (Score:2)
    
    by Rosco P. Coltrane ( 209368 ) writes:
    
    The obvious question is how much mail do you receive in total, how much non-spam, and how many false-positives go completely unnoticed by you?
    
    Well, this is what I can tell you : I've had my corporate email (the one that I really use publicly) for maybe 5 years, and I get maybe 15 mails/day not counting mailing lists, and possibly 300 total, the LKML taking a lot of that extra traffic. During the first year I've worked for my company, I was a support engineer, and I've never had (or heard of) a customer wh
  - Re:I don't receive spam (Score:4, Insightful)
    
    by letxa2000 ( 215841 ) writes: on Monday May 26, 2003 @06:44PM (#6042517)
    
    There are in fact two big problems with Bayesian filtering (or any content-based filtering) from the perspective of an ISP or company... 1) one person's spam is another person's necessity
    But that's why Bayesian advocates every user having their own Bayesian statistics. It's not a "one size fits all" for the entire ISP or company, as is the case with most keyword filters. Every user has a different set of Bayesian statistics which is why it is very difficult for spammers to get around this filter--they have no way of knowing what words are in each users' statistics.
    2) you still have to waste your bandwidth and CPU before you reject it.
    It's better to waste your bandwidth and your CPU than to waste the time of those receiving the spam. IMHO...
    So Bayesian filters are a good tool of last resort, but there are many other tools that should be used too.
    The quicker everyone uses Bayesian filters (as opposed to waiting until all the other filters are incapable of keeping up with spam) the sooner the spammers will be in trouble. I personally use both a Bayesian filter with an up-to-date blacklist of known spamvertised domains, etc. I find that, quite simply, the simple keyword filters catch spam from known spam sites and Bayesian catches the rest. But if I turned off my normal filters Bayesian would have caught it all since those spams are always assigned a high Bayesian score, too. It almost makes sense to turn off the other filters, but they can be useful if a spammer comes up with a truly unique spam and someone else has already identified the domain name. It's rare, but it can happen. So a combination of technologies is probably the best... but a combination that lacks Bayesian is a combination that could be better.
    
    Parent Share
    twitter facebook
- Re: (Score:2, Informative)
  
  by account_deleted ( 4530225 ) * writes:
  
  Comment removed based on user account deletion
- Re:I don't receive spam (Score:2)
  
  by Jeremi ( 14640 ) writes:
  
  - Do other people really receive that much spam, or am I an isolated case ?
  
  Yes, they do... I probably get 50-60 spams a day
  
  Do people who receive spam purchase things online, or register software and other services with their real names and email ?
  
  I made the mistake of putting my unobfuscated email address on my web page... bad idea :^P
- - Re:I don't receive spam (Score:2)
    
    by Rosco P. Coltrane ( 209368 ) writes:
    
    Looks like I'm a troll now, geez ...
    
    It was a real question though, I post on usenet too, on various mailing lists that get indexed on google somehow, I maintain several opensource projects, I have a homepage with my email in plaintext at the bottom, etc ... but I almost never get spam. I just wondered why :-)
Apple's Mail app... (Score:4, Interesting)

by useruser ( 638080 ) writes: on Monday May 26, 2003 @05:49PM (#6042205) Homepage Journal

...supposedly uses some form of Baysian reasoning [apple.com]. I've been using it for a year now. I trained it for a couple of weeks, turned it on "automatic filtering" mode, and now I can count the number of times its misclassified a message on my two hands. I used to get more spam than legit mail, now I can't help but wonder why spam is a problem for people. Until I remember that most people don't use a mac. Every once in a while, I flip it back into training mode so that I can see the lovely see of brown-colored spam messages that flood my inbox. I flip it back to automatic mode, Mail automatically moves them to my junk folder, and I can forget about them.

Share
twitter facebook
- Re:Apple's Mail app... (Score:5, Informative)
  
  by Anonymous Coward writes: on Monday May 26, 2003 @06:05PM (#6042293)
  
  Actually, the latent semantic analysis [colorado.edu] (LSA) that Apple uses is not a form of Bayesian reasoning; it uses a singular value decomposition (SVD) to perform generalized factor analysis [olisweb.com]. However, there is a probabilistic version of LSA [nec.com] out there.
  
  Parent Share
  twitter facebook
Evolution and (Score:3, Insightful)

by Gyorg_Lavode ( 520114 ) writes: on Monday May 26, 2003 @05:53PM (#6042222)

I have a simple questions, is there a way to impliment a Bayesian Filter for Evolution without having to add an extra stop for the email, (ie a mail server on my computer from which evolution picks mail up locally).

Share
twitter facebook
- Re:Evolution and (Score:3, Informative)
  
  by C3ntaur ( 642283 ) writes:
  
  Yes, I've done it and here's how:
  
  1. Get and install bogofilter.
  
  2. Make a shell wrapper script that runs bogofilter in passthrough mode, redirect stdout and stderr to files in /tmp for debugging and training bogofilter. Here's mine:
  
  #!/bin/bash /usr/bin/bogofilter -p -u > /tmp/bogo.out 2> /tmp/bogo.err
  status=$?
  exit $status
  
  3. Make a new local mail folder in evolution to collect spam.
  
  4. Make a filter in evolution that runs the wrapper script. Tools->Filters, choose Incoming, choose Add. Add a c
Here's one I've used (Score:4, Insightful)

by wiggys ( 621350 ) writes: on Monday May 26, 2003 @05:55PM (#6042232)

I set up Popfile [sourceforge.net] a few weeks ago at work to stop the deluge of spam one of our POP3 accounts was getting. I've never used a spam filter before (other than the usual basic keyword-based ones) and I must say that bayesian filtering is very impressive!
I find in our case it stops 98-99% of spam dead in its tracks. There have been a few false positives, and you do need check from time to time just in case an genuine emails are misclassified, but it's surprising just how quickly the filter sorts the wheat from the chaff.
Don't expect miracles but they can save you a lot of time... what I find cool is that it learns so quickly, almost like a complicated neural net should, but it's such a simple idea. I wonder if there are any other uses for this kind of thing?

Share
twitter facebook
- Re:Here's one I've used (Score:2)
  
  by FFFish ( 7567 ) writes:
  
  Reverend is a general purpose Bayesian classifier, named after Rev. Thomas Bayes.
  Use the Reverend to quickly add Bayesian smarts to your app. To use it in your
  own application, you either subclass Bayes or pass it a tokenizing function. Bayesian fun
  has never been so quick and easy. Many thanks for Christophe Delord for his well written
  PopF. Orange also looks good.
  
  Stuff you can do with the Reverend:
  - classify recipes by cuisine
  - who do you write like? Shakespear, Dickens, Austen, Aesop
  - detect the language o
- - Re:Here's one I've used (Score:3, Funny)
    
    by swillden ( 191260 ) writes:
    
    >I wonder if there are any other uses for this kind of thing?
    Yes, there is:
    http://groups.google.com/groups?q=venue+group:comp
    .lang.python&hl=en&lr=&ie=UTF-8&safe=off&selm=mail
    man.1048821167.17118.python-list%40python.org&rnum =1
    
    You mean like automatically deleting unusable links so we don't have to try to figure out how to get them to work?
Crude but effective (Score:5, Insightful)

by MrWorf ( 216691 ) writes: on Monday May 26, 2003 @05:59PM (#6042261) Homepage

I simply got to the point that I could count the number of real emails on my hands. So I reversed my previous filter. Instead of filtering spam to my spam folder, I made it default *ALL* mail to the spam folder except from certain known addresses (such as work, friends and my own domain). So far, it has only made one wrong decision, and that was because I hadn't written the email address of a friend correctly.

This is waaaaay better than any other filtermethod I've tried and requires no learning period at all :)

Share
twitter facebook
- Re:Crude but effective (Score:3, Funny)
  
  by wiggys ( 621350 ) writes:
  
  You know, I think you're on to something there. I sent you an email offering you money so I can sell the idea... but I've a feeling it's been classified it as spam. Shame!
- Re:Crude but effective (Score:2, Funny)
  
  by marko123 ( 131635 ) writes:
  
  I used a similar "Crude but effective" technique at work. I had a job where most days were bad, but some were good. So I told my boss to go fuck himself and now I don't have any bad days any more. Of course, my false positives (the good days) are also gone.
- Mozilla does this (Score:3, Informative)
  
  by Anonymous Coward writes:
  
  Mozilla incorporates a twostep filter:
  
  1. Is the sender in the address book? If yes, is not spam, otherwise:
  2. Does the message have a probability of 90% that it is spam based on the Bayes filter? If so, flag as spam, otherwise not spam.
Slight modification: white-list+Bayesian is useful (Score:5, Interesting)

by Jeremi ( 14640 ) writes: on Monday May 26, 2003 @06:13PM (#6042342) Homepage

I've found that if you add a small tweak to the Bayesian Filter, it becomes even more useful. The tweak is this: Any time you tell the Bayesian filter that an email is "non-spam", it auto-adds the From address of that email to a white-list, so that from then on any emails from that address are automatically marked as "non-spam" by the filter, no matter what they contain. (conversely, any time you mark an email as "spam", the source address of that email is removed from the white-list, if it is present)

This allows your single spam/non-spam feedback to the system to do double duty, so that once the program knows that you consider an email source to be "trusted", it will allow even spammy-looking stuff (read: mailing list digests, plane schedules, bank statements, etc) through to your non-spam folder.

Of course, if spammers start constructing google-style databases of who your friends are and impersonating their accounts, then this won't work anymore... but if they start that, all hell is going to break loose anyway.

Share
twitter facebook
- Re:Slight modification: white-list+Bayesian is use (Score:2)
  
  by Plug ( 14127 ) writes:
  
  A slightly different idea that I was considering today works as follows.
  
  Take the Tagged Message Delivery Agent [tmda.net], a system that will send a challenge message to anyone it doesn't know (isn't in the whitelist), which you have to reply to.
  
  Then change it so anything allowed through on the whitelist is added to the "Not Spam" category, and anything that is challenged is passed through the filter. If it passes, it doesn't get challenged (but also doesn't get added automatically to Not Spam), and if it _doesn't_
Browser ad-blocking the same way? (Score:2, Insightful)

by DrJAKing ( 94556 ) writes:

I wonder if a Bayesian classifier could sort out banner ads? I currently use Guidescope to block them, but it would be far better not to rely on a third party to decide what's an ad URL. It think it would work, but training it might be hard.

(And before anyone says "Don't do that, websites will die" my response would be "Good, let most of them die." I hate ads.)
- - Re:Browser ad-blocking the same way? (Score:2, Informative)
    
    by bhtooefr ( 649901 ) writes:
    
    Did you have Windows? If you did, it was probably WebWasher [webwasher.com]. It is free for home use. The download link is buried in the front page, so here's a direct link to the WebWasher Classic site: http://www.webwasher.com/client/home/index.html?la ng=de_EN [webwasher.com]
Nostalgia (Score:2)

by SirDaShadow ( 603846 ) writes:

Reading about the history of email and instant messaging, it reminded me of how easy it was to echo "Hi there" > /dev/tty01 to send a message to another college acquaintance...ahh the memories...
I don't even try to filter spam out. (Score:3, Insightful)

by belroth ( 103586 ) writes: on Monday May 26, 2003 @06:47PM (#6042542)

Instead I filter all of my mail for wanted/expected mail into a (large) tree of input folders, mailing lists, company mailings etc.
Most of what's left is spam, so a quick scan of the inbox (and creation of new rules) weeds out the uncaught desirables and the rest gets dropped in the bitbucket.
The point being that legitimate mail doesn't try to spoof my filters. I haven't (yet) had any spam arriving where it shouldn't. I'd rather my ISP dumped all the crud in the bin for me, but my marginal cost is low as I'm on ADSL. I now also use a distinct email for each purpose, making it easy to spot where spammers got it from and to create new rules as needed. It's a shame I didn't do this at the start as I have a couple of early ones that are spammed but I can't dump.

Share
twitter facebook
Comment removed (Score:3, Interesting)

by account_deleted ( 4530225 ) writes: on Monday May 26, 2003 @06:54PM (#6042576)

Comment removed based on user account deletion

Share
twitter facebook
naive implementation of naive Bayesian (Score:4, Informative)

by g4dget ( 579145 ) writes: on Monday May 26, 2003 @06:58PM (#6042608)

Graham's method is called "naive Bayesian", and it's called "naive" for a reason. It works surprisingly well, but it barely scratches the surface of what people are doing with statistical models of text.

The lack of references on Graham's web site to prior work on text classification makes one wonder whether he just is unfamiliar with a huge body of literature going back decades or whether he just deliberately ignores them. Either way, Graham didn't invent any of the techniques and they are far from state-of-the-art. (Incidentally, you'll probably find Octave or Perl/PDL a more convenient language for implementing this stuff than Lisp.)

Anybody seriously interested in text filtering should at least do a little bit of background reading. "Readings in Information Retrieval" by Jones and Willett covers some of the basic papers.

Share
twitter facebook
My solution. (Score:3, Interesting)

by Lord Kholdan ( 670731 ) writes: on Monday May 26, 2003 @07:37PM (#6042810)

I don't use email. Yes, I have a few addresses but I havent checked them in months. Email is kinda dead way of communication anyway, beaten by things such as mobile phones and instant messaging.

Share
twitter facebook
No Junk Mail please.... (Score:3, Funny)

by TomMajor ( 580331 ) writes: on Monday May 26, 2003 @07:46PM (#6042861)

On my mailbox outside my apartment I have a "No Junk Mail please" sticker... This actually works. I tried to put the same sticker on my pc, but the junk mail just keeps on comming... I don't understand....

Share
twitter facebook
The best email filter (Score:3, Interesting)

by Spud the Ninja ( 174866 ) writes: on Monday May 26, 2003 @07:46PM (#6042862) Homepage

Why go through all the work of training some software to read your email and decide if you might want to read it when most email programs have white list capabilities?

If I don't know you, that means I don't want to talk to you. Your email goes straight a junk folder, which I can quickly scan once every few days for from names I recognize. I can add these names to my white list if I so choose.

Granted, my job does not involve me soliciting contacts from the public at large, so this wouldn't work for everyone. I use it on my personal Hotmail account though, and I get to not even consider lots of crap every day.

Share
twitter facebook
Bayesian for windows? (Score:2)

by NeoSkandranon ( 515696 ) writes:

As soon as there's a well-written app that works with or on top of programs like Outlook and Netscape, I will be excited. Until then, a huge (and likely most targeted) sector of people remains relatively un-filtered.
- - Re:Bayesian for windows? (Score:2)
    
    by NeoSkandranon ( 515696 ) writes:
    
    They say on the website it is explicitly NOT for Outlook express, the most popular version of outlook. is there any way at all to get it working there?
    - - Re:Bayesian for windows? (Score:2)
        
        by NeoSkandranon ( 515696 ) writes:
        
        POPfile has no imap/http support, my primary email addresses (and the ones in most need of spam protection) are not POP3
Bogofilter 0.9 NOT 0.11 (Score:2)

by LinuxHam ( 52232 ) writes:

I just had to backlevel bogofilter from 0.11 to 0.9. I don't know WTF happened between those two revs, but the filtering algorithm went straight to hell. I had forgotten that I normally get over 100 spams a day until I went to 0.11. Then it all came back and I started losing half an hour a day to sorting out my email.

I gave it a chance for over two weeks, and it never got even close to the success rate of 0.9. Not that I'm complaining, there was nothing left to improve upon in 0.9 AFAIC. (And yes, I did se
- Re:who're the vikings? (Score:5, Informative)
  
  by Evil-G ( 529075 ) writes: <g_r_a_m_2000@AAA ... inus threevowels> on Monday May 26, 2003 @05:39PM (#6042151) Journal
  
  A group of vikings in a monty python sketch drowned out normal conversation by shouting the word "spam" louder and louder. The word was then adopted for all the crap drowning out normal conversation on usenet.
  
  Parent Share
  twitter facebook
- Re:who're the vikings? (Score:3, Insightful)
  
  by RobotRunAmok ( 595286 ) * writes:
  
  The Monty Python Comedy troupe did a rather famous (in some Geek circles) skit in which the virtues of canned Spiced Ham are literally sung. Inexplicably, a group of Vikings join in the song.
  
  The poster, obviously better schooled in British farce than luncheon meats, is under the impression that the widely accepted nickname for unsolicited e-mail is derived from the comedy sketch and not from Spam(tm), the food.
  
  I don't know for certain if he's wrong, but I have a hunch he is. I'm guessing a lot more peop
  - Re:who're the vikings? (Score:2, Informative)
    
    by mlk ( 18543 ) writes:
    
    Not many people know were the term "spam" comes from, but everyone[1] knows what it means (email wise), and it does come from the Monty Python sketch.
    
    However it did not orginally go with bulk email, but instead with some wanker on posting the same post over and over again on a newsgroup or IRC.
    
    [1] Including "normal" users.
- Re:who're the vikings? (Score:2)
  
  by geeber ( 520231 ) writes:
  
  go to http://erik.selwerd.nl/monthy-python.html all shall be revealed.
- Re:Spam = /dev/null (Score:5, Informative)
  
  by GammaTau ( 636807 ) writes: <jni@iki.fi> on Monday May 26, 2003 @05:45PM (#6042180) Homepage Journal
  
  Bayesian filtering could stop all the spam that easily? This is great! Where can I download a filter like this?
  
  You can try bogofilter [sourceforge.net], ifile [nongnu.org], SpamBayes [sourceforge.net], or POPFile [sourceforge.net]. The newer versions of SpamAssassin [spamassassin.org] also implement some kind of Bayesian filtering.
  
  Parent Share
  twitter facebook
  - Re:Spam = /dev/null (Score:2, Insightful)
    
    by mnemonic_ ( 164550 ) writes:
    
    I like SpamBayes for its ability to be trained on past spam. You can point it to a folder full of past spam and it scores them all, which is much faster than gradually teaching the software to recognize spam through individual email updates.
    
    POPFile does not have this convenient ability (yet), though it does do general purpose sorting (i.e. not just differentiate between spam and non-spam, but stuff like work, school, linux or whatever you want). It does take a while to train though.
  - Re:Spam = /dev/null (Score:2, Informative)
    
    by sabaco ( 92171 ) writes:
    
    Don't forget SpamProbe [sourceforge.net] as well. I've been using it for a couple weeks, and it has been working very well for me. I've gotten around 1400 messages, and so far 1 false positive and 6 false negatives. I don't know how well the other filters work, but that seems pretty good to me. It's sure a hell of a lot better than the DNS blacklists I use. (I'm still using those. After all, they filter out the first 70% of my incoming mail and are probably faster anyway.)
- Re:Spam = /dev/null (Score:2)
  
  by Roadmaster ( 96317 ) writes:
  
  I use SpamProbe [sourceforge.net], it's quite mature, actively maintained, has good performance and plenty of features. Of course this depends on your platform, for Windows ive heard good things about POPFile.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Yes, we must filter out the dummies (Score:5, Funny)

Re:Yes, we must filter out the dummies (Score:3, Funny)

Re:Yes, we must filter out the dummies (Score:5, Interesting)

Re:Yes, we must filter out the dummies (Score:2, Insightful)

Human filtering (Score:3, Funny)

Re:Yes, we must filter out the dummies (Score:5, Interesting)

Re:Yes, we must filter out the dummies (Score:3, Funny)

Re:Yes, we must filter out the dummies (Score:2)

Re:Yes, we must filter out the dummies (Score:5, Insightful)

Re:Yes, we must filter out the dummies (Score:3, Interesting)

Re:Yes, we must filter out the dummies (Score:5, Interesting)

Re:Yes, we must filter out the dummies (Score:3, Redundant)

Re:Yes, we must filter out the dummies (Score:2)

Re:Yes, we must filter out the dummies (Score:3, Funny)

A bit of info on Bayesian filtering (Score:5, Informative)

Re:A bit of info on Bayesian filtering (Score:3, Informative)

Re:A bit of info on Bayesian filtering (Score:5, Insightful)

Re:A bit of info on Bayesian filtering (Score:5, Insightful)

Brief Tech Notes on Bayesian Filtering (Score:5, Informative)

Re:Brief Tech Notes on Bayesian Filtering (Score:2, Insightful)

Re:Brief Tech Notes on Bayesian Filtering (Score:4, Insightful)

Re:Brief Tech Notes on Bayesian Filtering (Score:2)

Re:Brief Tech Notes on Bayesian Filtering (Score:2, Insightful)

Re:Brief Tech Notes on Bayesian Filtering (Score:2)

You asked for it ;) (Score:2)

DON'T implement it like the parent (Score:3, Informative)

It's not bad... (Score:3, Interesting)

Re:It's not bad... (Score:2, Insightful)

Re:It's not bad... (Score:2)

Re:It's not bad... (Score:3, Informative)

Origin of SPAM (Score:3, Interesting)

Re:Origin of SPAM (Score:5, Informative)

Re:Origin of SPAM (Score:5, Informative)

Dialect or typo? (Score:3, Funny)

Vikings? (Score:2, Funny)

Re:Vikings? (Score:2)

Re:The Normans WERE Vikings (was: Vikings?) (Score:2)

More Spam! (Score:3, Insightful)

Speaking of dummies... (Score:5, Informative)

"Alanis irony" (Score:3, Funny)

Re:"Alanis irony" (Score:3, Interesting)

Re:Speaking of dummies... (Score:2, Funny)

Re:Speaking of dummies... (Score:2)

Ironic? (Score:5, Funny)

Re:Ironic? (Score:2)

Re:Ironic? (Score:4, Insightful)

Reminds me of a story (Score:3, Funny)

0.0001% response rates (Score:3, Insightful)

No no no, Bayesian Filtering *OF* Dummies please (Score:4, Funny)

Do spammer's techniques work on slashdot ? (Score:5, Funny)

Re:Do spammer's techniques work on slashdot ? (Score:2, Funny)

Re:Do spammer's techniques work on slashdot ? (Score:4, Funny)

Wrong pic... (Score:4, Informative)

Hmmm (Score:2, Insightful)

Re:Hmmm (Score:3, Insightful)

Required Reading by E-mail Users (Score:4, Interesting)

Re:Required Reading by E-mail Users (Score:3, Insightful)

Re:Required Reading by E-mail Users (Score:3, Interesting)

Re:Required Reading by E-mail Users (Score:2)

I don't receive spam (Score:5, Interesting)

Re:I don't receive spam (Score:2)

Re:I don't receive spam (Score:2)

Re:I don't receive spam (Score:4, Insightful)

Re: (Score:2, Informative)

Re:I don't receive spam (Score:2)

Re:I don't receive spam (Score:2)

Apple's Mail app... (Score:4, Interesting)

Re:Apple's Mail app... (Score:5, Informative)

Evolution and (Score:3, Insightful)

Re:Evolution and (Score:3, Informative)

Here's one I've used (Score:4, Insightful)

Re:Here's one I've used (Score:2)

Re:Here's one I've used (Score:3, Funny)

Crude but effective (Score:5, Insightful)

Re:Crude but effective (Score:3, Funny)

Re:Crude but effective (Score:2, Funny)

Mozilla does this (Score:3, Informative)

Slight modification: white-list+Bayesian is useful (Score:5, Interesting)

Re:Slight modification: white-list+Bayesian is use (Score:2)

Browser ad-blocking the same way? (Score:2, Insightful)

No no no, Bayesian Filtering OF Dummies please (Score:4, Funny)