Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Spam IT

DSPAM v3.6 Released 100

Nuclear Elephant writes "After six months of development, DSPAM v3.6 has been released. The most notable change is the series of new features added to make an anti-spam gateway appliance possible (Knoppix anyone?). Version 3.6 also includes a highly accurate alternative to Bayesian filtering known as Markovian discrimination, based on Bill Yerazunis' research. Other significant enhancements include trusted sender whitelisting, integrated Clam Antivirus and LDAP support, a centralized spam training alias, and a new dependency-free storage driver. Much of the documentation has also been rewritten to make installation easier. A change log and release notes are also available. Slashdot has recently featured a review of the author's book, Ending Spam and an interview as well."
This discussion has been archived. No new comments can be posted.

DSPAM v3.6 Released

Comments Filter:
  • It would be interesting to compare this version to other spam filters and see how it measures.
    • TREC [nist.gov]'s Spam Track [uwaterloo.ca] will evaluate several spam filters. There's also a toolkit for do-it-yourself comparison.

      Although DSPAM is not an official participant at TREC, three configurations will be evaluated for comparison - with tum, toe, and teft training modes. Zdziarski reported some of the preliminary results in his interview, but complete and comparative results won't be available until TREC in November.

    • While it's great that it learns and makes decisions about the "spamminess" of various incoming items, the most reliable method I've found so far is Greylisting.

      The moment I installed and started GLD (gasmi.net), the spam simply stopped. It was like flipping the "nospam" switch on. The spam just stopped. No false positives, no missed spam, nothing.

      Every now and then I get unwanted email, but at least now it's from an actual, identifiable SMTP server, not a spam-bot.

      It's an amazing improvement from i
  • Finally a decent anti-spamming utility. There's been a lot of hype around this product and it is not out of place. I like the way its (at least partially) integrated to clam(win?). I still feel it wont be long for spammers to find ways around this tool... but for now, great, im definately using it.
  • by Jaruzel ( 804522 ) on Monday October 17, 2005 @08:15AM (#13808337) Homepage Journal
    I know I'm going to get mauled over this quesiton... but has anyone compiled it on Windows 2003 server ?

    For practical reasons I don't have linux in my test lab, and I'd like to have DSpam on my Webserver which is running IIS6 and Windows 2003 Server.

    I can see I need to run it in SMTP mode with a relay to my Exchange box, but I don't want to waste my time trying to compile it (using Visual Studio), if someone already knows it wont work.

    -Jar.
    • That was how earlier version worked. I don't know of anyone who actually got them to work natively under Windows.
    • by myspys ( 204685 ) on Monday October 17, 2005 @08:40AM (#13808400) Homepage
      from the FAQ (http://dspam.nuclearelephant.com/faq.shtml#1.15 [nuclearelephant.com])

      Q. Does it work with Windows?
      A. v3.2 is the first to include a Windows build supplement, which includes the necessary Visual C++ project files and portage to compile the agent and tools under Windows. Check out the win32/ directory in the source tree for more information. Win32 support is still unofficial, but seems to work well. Of course getting it compiled is one thing, getting it integrated is another. It's probably best to build it under Cygwin using the general distribution.
      • A. v3.2 is the first to include a Windows build supplement
        I downloaded version 3.6.0, but there seems to be nada :) support for Visual C. No win32 directory to be found. However on the download page, in the unsupported section, there was also DSPAM v. 3.2.8 [nuclearelephant.com], which indeed does contain the Windows stuff.
    • Linux Router (Score:4, Interesting)

      by Stavr0 ( 35032 ) on Monday October 17, 2005 @08:46AM (#13808421) Homepage Journal
      I know I'm going to get mauled over this quesiton... but has anyone compiled it on Windows 2003 server ? (Release the hounds!)

      How about getting it compiled into a Linksys WRT54G router firmware i.e Sveasoft firmware?

      • Re:Linux Router (Score:4, Informative)

        by op00to ( 219949 ) on Monday October 17, 2005 @08:57AM (#13808474)
        DSPAM, as it's running in my cluster, is using way more ram than the WRT54G physically has. Probably not a good idea to run it on that little box.
      • My understanding is this sort of filtering isn't practical on any of the consumer routers due to their limited memory. The applications load the email messages to scan them, and between the OS code, the scanning package, and the email being scanned there simply isn't enough memory to hold it all, even on the larger WRT54GS units. My own hope is that Cisco's Linksys subsidiary eventually 'gets smart' and releases a combination WRT54GS / NSLU2 / PAP2 appliance, with more RAM, that is Linux-based and hackable
    • Version 3.4 has win32 support, but nobody wanted to maintain the build kit. It stopped working with 3.6 and was removed. You can build 3.4 natively in Windows, or you can build 3.6 under Cygwin.
  • There isn't any trademark problems with DSPAM?
    SPAM [spam.com] is a registered trademark of Hormel Foods Corporation, and DSPAM aren't the Monty Python [montypythonsspamalot.com].
  • by Anonymous Coward
    DSPAM is also noted for their trademark spat with Hormel, who tend to be nice about "spam" as a term until it's spelled in all-caps. (Previous Slashdot coverage.) [slashdot.org]
  • Too late (Score:3, Funny)

    by mordors9 ( 665662 ) on Monday October 17, 2005 @08:30AM (#13808381)
    But the great news is this product is no longer needed. After all the FBI has put a stop to all of that: http://www.detnews.com/2005/technology/0510/16/B01 -349738.htm [detnews.com] (For those that are easily confused, the comment was tongue in cheek)
  • Try DSPAM (Score:4, Informative)

    by ajs ( 35943 ) <{ajs} {at} {ajs.com}> on Monday October 17, 2005 @08:41AM (#13808407) Homepage Journal
    I'm a long-time proponent of and rare contributor to SpamAssassin, and I'll continue to be, but fighting spam is much like fighting disease: you have to diversify your defenses. DSPAM is a nice package, and is very well designed. I've spoken to the author in the past, and he has an excellent understanding of the complexities of the issue (as opposed to the legions of people who seem to think that spam filtering should be easy, given the right algorithm).

    As far as I'm concerned there are two tools for spam filtering: DSPAM and SpamAssassin. Try them both. See what fits your needs. My impression is that SpamAssassin provides more knobs and buttons and is more easily extended by the casual user, but DSPAM can be lighter weight. Both are highly accurate, with very low false positive rates.
    • There are lots of alternatives. Bogofilter, spamprobe, spambayes, popfile, dbacl, are all quite effective.
      • From what I know of those projects, they're all Bayesian filters and little more. Maybe a white/black list. That's what the GP post was referring to when he wrote "as opposed to the legions of people who seem to think that spam filtering should be easy, given the right algorithm". I don't know much about this DSPAM, but SpamAssassin covers a whole bunch of tests. It started off as a list of common-sense patterns looking for the usual penis/breast enlargement etc spam in the email body and suspicious info in
        • Re:Try DSPAM (Score:3, Informative)

          by gvc ( 167165 )
          I use Spamassassin with a special user configuration file [uwaterloo.ca] and I train it systematically. In this configuration it works pretty well (much, much, better than out-of-the box). But Bogofilter and Popfile work about as well. As does just the Bayesian component of Spamassassin, ignoring all the other cruft. DSPAM, on the other hand, doesn't work at all well for me.
          • For the most part you seem to be:
            • Shutting off auto-learn (mistake, see below)
            • Upping BAYES scores (good plan, I do too)
            • enabling a few knobs that are generally useful (though I've had too many false positives with RCVD_IN_DSBL).

            The only thing I would critisize is shutting off auto-learn. If you want to be conservative, just lower the ham threshold and raise the spam threshold a bit. I tried to manually train for a while, and what I found was that I was actually lying to SA. auto-learn means that a view of yo

            • Auto-learn in spamassassin is broken. In fact my mail script automatically calls sa-learn for every message, with ham or spam depending on what Spamassassin claims. Then if I want to correct it I call sa-learn over again with the correct classification. That's why the user-prefs file has it turned off.

              I should make this more clear in my notes. Thanks for pointing it out.
              • Explain "broken". Works great for me.....

                Training on everything is probably a mistake. Catching all of the edge conditions where that fails is going to be a very laborious task. Do all of your users do the same, or do you force their auto-learning off and have them use your bayes tokens? That has its own problems (you're not training on their mail), but at least would not leave an inattentive user in the horrible situation where they are constantly training incorrectly. That quickly leads to a broken classi
                • Explain "broken". Works great for me.....

                  Some explanation appears here [uwaterloo.ca].

                  In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained.

                  You can adjust the "extremeness" of the score under which Bayes is trained but training will not be on what Spamassassin reports; only on

                  • "In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained."

                    Hrm... well, no.

                    First off "number of hits" is not an "extra ad hoc condition". Number of "hits" is exactly "score". There's no difference, just two pieces of terminology for the same thing. "Level" is another thin
                    • Hrm... well, no.

                      All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.

                      For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.html [uwaterloo.ca]

                      I wrote that paper, and the configuration I posted here is what was used in the best-scoring run.

                      For your conven

    • The problem with SPAMD, SpamAssassin etc. is they rely too much on training and user interaction. If a user has to go into the SPAM box and double check that no mistakes have been made then the system is worse than not having any SPAM checking at all as most users will not check the SPAM box, this is especially true for larger deployments where it is much harder to train users and these environments usually cannot afford for these sorts of mistakes to be made.

      I've found greylisting to be the best solution
      • Re:Try DSPAM (Score:3, Insightful)

        by gvc ( 167165 )

        If a user has to go into the SPAM box and double check that no mistakes have been made then the system is worse than not having any SPAM checking at all.

        Not true. First, if the user's mailbox is cluttered with spam, the user is more likely to overlook good mail. More likely than a good spam filter. Second, it is way easier to scan a list of predominantly spam for occasional good mails (and vice versa) than to have everything jumbled together. Third, spam filters are good enough that one does not need n

        • I've found that nearly all of my users actually prefer an interactive system like dspam over a fully-automatic system. Both systems make mistakes, but the interactive system gives the user a feeling of empowerment to fix mistakes and improve their accuracy over time.

          It's better for the admin, too... When a non-interactive system makes a mistake, I find that the users complain -- either to the admin or to each other. But with dspam, they reclassify the missed message and continue working, happy to know th
          • Absolutely. It is cathartic to punish spam by reporting it to your spam filter. And, of course, fully automatic systems aren't nearly as good as claimed. (Neither are learning filters - 99.9...% accuracy? pshaw! - but they're better than non-learning ones.)

            • I used SPAM Assassin quite happily for many years but found the effectiveness started dropping, there are some messages that just can't be caught, usually these are the worst kinds of messages (ie. a face full of spunk) almost always received by the people most likely to be offended (ie. 55 year old female administrative staff).

              False positives seem to be more of a problem written in languages other than English. Pretty much all of our e-mail in Welsh language we receive through AOL has been tagged by AOL a
      • If a user has to go into the SPAM box and double check that no mistakes have been made then the system is worse than not having any SPAM checking at all as most users will not check the SPAM box

        I use a three-outcome approach with SpamAssassin. Messages scored below 5 are delivered to the user's INBOX. Messages scored 5 or higher, but less than 10 go into the spam box. Messages scored 10 or higher are rejected during the SMTP session, with instructions on how to proceed.

        I did this because, in practice,

    • This is just one admin's viewpoint... it may not reflect anyone else's experiences. It's just what I've found over the years, using both systems.

      Accuracy... SpamAssassin generally offers higher accuracy with less effort, at first, but the accuracy degrades over time. DSPAM takes more effort initially, but offers higher, sustained accuracy over the long term. I see an average of about 99.5% long-term accuracy with dspam. I can't tell what the accuracy was with spamassassin, since it doesn't include a wa
      • Were you using procmail and individual spamassassins, or using spamd/spamc for mail checking? I wonder if that's the reason people see such super-high CPU loads with SA. I was delivering around 10K-15K messages/day (roughly 50 users), with SA identifying around 85% as spam. The backup MX ran spamd with user prefs and bayesian keys stored in MySQL, and the primary MX delivered through procmail using spamc. The backup MX/spamd machine was a P3/800 with 512M RAM and the primary MX was an Athlon 1000 with 1
        • I saw the SA load problem happen both with and without using the daemon setup. However, the systems were slower than what you described, and did a lot more than just handle email. They were dual-500MHz boxes, but couldn't keep up with the incoming mail. Mail arrived faster than SA could process it, even though it was just a few dozen accounts. It would tend to catch up at night, but email during the day was pretty lagged.

          I haven't tried dspam as a daemon yet, but intend to try it soon to see how it work
          • That backup MX was also the primary DNS and syslog server, but that's not much of a load. The primary MX was also the pop/imap/web server, for what it's worth. My home setup is about 5 users with around 5-7K messages/day, and I run spamd and MySQL on the same box - which is a dual Celeron 400 machine. Messages come in on an AMD 5x86-133 gateway which does the DNS lookups and tehn forwards to a PPro233 which calls spamc (that one's also the web server). All three machines combined have less computing pow
  • I use Gmail. :)
    • I use Gmail. :)

      "So I let Google spam me in a targeted and personal manner via HTML rather than random people spamming me through SMTP."

      I can understand why you're so proud.
      • I configured my gMail account to Moz Thunderbird. No targeted ads, and the benefit of the greatness that is the gMail spam filter. I would say that it is quite possible the GP poster does as well.
        • I configured my gMail account to Moz Thunderbird. No targeted ads, and the benefit of the greatness that is the gMail spam filter. I would say that it is quite possible the GP poster does as well.

          Yeah, I bet at least 99% of gMail users know how to do that.
          • If that was meant to be sarcastic it should not be. Gmail is invite only and the first invitations went to an all tech savy crowd. Although gmail has spread far and wide I think the audience is still primarily tech oriented.
    • I get an incredible amount of spam bounces in my GMail account -- from somebody sending lots of spam using my GMail address as the From: or the Return-to: address.

      I really, really want an option for GMail to record the message-id of all messages I ever send through their server, and bounce any which are returned to me but which they haven't got on record as being sent by me.

      I requested this ages ago, and it should be relatively straightforward. Does anyone else have this problem?
  • This is one of those things that makes me wonder...which "side" is pushing the technological envelope further and faster, the {spammers | malware slimers | virus breeders} or those who develop to defeat them?

    Since it's generally agreed that history is written by the winners of a given conflict, I guess we won't have an answer to that until the war's over.

    This comment generously brought to you by a severe lack of caffeine.

    • This isnt really a chicken and egg situation. Whats the answer to 99 out of a 100 questions ?, Money.

      Spammers used email to sell things whilst at the same time pissing everybody off. Eventually people hate spam so much that they are willing to pay for services that try and and eliminate spam.

      It may not always be so but spammers have always been one step ahead, they have more incentive.
  • How well does "Markovian discrimination" work in practice? It sounds fascinating, but what is the false-positive rate that can be expected on average??
    Geez from dealing with spammers to working with the crap DiamondTouch, Yerazunis is a real glutton for punishment :)
    • The CRM114 classifier/filter has used markovian and derivatives thereof for quite some time and claims 99.984% accuracy.
      A downside is that markovian is quite a lot more resource intensive than simple bayesian.

      I used bogofilter (a fast bayesian filter) before CRM114. Even if it was harder to setup CRM114 than bogofilter and it used more resources, it was totally worth it.
    • You apparently missed Iglassware, Bill's contribution to measured drinking, and his role in the JunkYard Wars, at http://www.tms.org/pubs/journals/JOM/0310/Byko/Byk o-0310.html [tms.org]
    • Below are some tests I ran with a pre-release version of DSPAM on a test corpus. As you can see, Markovian discrimination is significantly more efficient than any Bayesian methods and Chi-Square. Markovian showed slightly more (4 more than the top contender) false positives, but it also caught 100 more spam... some additional tuning, tweaking, and most importantly, training, can easily get this down to a very low error rate.

      Bayesian (burton)
      TP: 785 TN: 1003 FN: 218 FP: 4 SC: 4 IC: 0
      SR:
      • so in your opinion, is 4 more false positives worth the increase in true positives?

        this is one thing i'm struggling with, is how to compare the results of 2 filters on the same corpus.
        we know FP's are substantially worse than any spam that gets through, but how much worse?
        • 4FPs for 100-something more TPs? Heck yeah. At least for me.. But keep in mind these are just preliminary training numbers with 1000 messages in each corpus. After real-world training, any of these approaches will be much more accurate.
    • Yerazunis is a real glutton for punishment


      He leavened it with appearances on Junkyard Wars [the-nerds.org].

  • OpenBSD port (Score:3, Informative)

    by chrysalis ( 50680 ) on Monday October 17, 2005 @09:43AM (#13808727) Homepage
    The OpenBSD port can be downloaded from ftp://ftp.00f.net/misc/port-dspam-3.6.0.tar.gz [00f.net]
  • ...significant enhancements include trusted sender whitelisting...

    I thought that whitelisting had been a feature of every email reader/server since spam filtering began.

    • I thought that whitelisting had been a feature of every email reader/server since spam filtering began.

      DSPAM's trusted sender whitelisting is automatic, based on who you converse with. It's not quite social networking, but is very useful, and requires no effort on the end-users part.
  • by pabl0 ( 228298 ) on Monday October 17, 2005 @01:34PM (#13810333)
    ... but it'll sound like one: I recently converted from a rather involved anti-spam defense utilizing SpamAssassin with Razor, Pyzor, and several RBL checks. I spent a fair amount of time selecting RBLs that worked the best and tweaking SA test scores whenever I got false positive/negative messages. I even had all sorts of validity checks turned on in the MTA to block out badly formed messages and the like.

    I replaced all those defenses with: DSPAM. And I'm seeing better results out of the box than I ever did with a multi-layered SA-based solution, even after a lot of time tweaking.

    A quick anecdote: When I converted, I opened up a bunch of previously blocked spamtrap addresses, just to get some good training material for the filter. I've long since passed my initial training threshhold but haven't even bothered to block the spamtraps again because I never see the spam. At the risk of sounding like I'm bragging, I literally don't have a spam problem anymore, and DSPAM is entirely responsible for that.

    Now, I'm not necessarily advocating that you give up all your custom defenses and switch to DSPAM. (I've turned off all my other filters, but I haven't removed them completely.) There's always a chance that an ingenious spammer will find a weakness in DSPAM setups, but I can testify to the fact that DSPAM is "scary good" as of right now. Training the filter is a simple matter of dropping misclassified messages (and there aren't many) into an IMAP folder.

    If what you have is working for you, stick with it. But if you're looking for a low-maintenance, high accuracy filter, you should definitely give DSPAM a shot.
  • Why is it not included in Debian?
    Spamassassin is.
    Bogofilter is.
    Popfile is.

    I thought it was the license, but seems that DSPAM is GPL.
    So, can anyone comment? I'm not installing it
    for my server if i can not apt-get it and have debian
    security support for it.
    • Why is it not included in Debian?

      There's been a lot of interest in this area but nobody's felt like taking it upon themselves to make a Debian distro AFAIK. Part of it may have had to do with the storage driver backend, which supports several different approaches, but required a recompile to switch from say Postgres to MySQL. In 3.6, the storage backend can be built dynamically making packaging much easier. Perhaps someone will pick 3.6 up now.

The rule on staying alive as a program manager is to give 'em a number or give 'em a date, but never give 'em both at once.

Working...