Live spam-catching contest at CEAS 126
noodleburglar writes "The 2007 Conference on Email and Anti-Spam (CEAS) will feature a live spam-catching contest. Entrants will be treated to a torrent of spam and must use their spam filtering technique to filter out as much as possible, while also letting legitimate messages. My money's on Spam Assassin." This ought to be a sweeps week television spectacular.
CRM114 (Score:4, Informative)
Re: (Score:1, Funny)
This is either:
1) "automatic" white-listing?
2) Not healthy and you should eat more fibre.
Agile and evolutionary versus ergodic spam (Score:3, Insightful)
To see why this matters consider two spam hypothetical spam programs. One blocks 99% of the test set spam but lets a
Re: (Score:2)
No it isn't. Hence the name Live Spam Challenge.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
In the past, filters have been tested on spam data collected over literally a year or more, which captures the natural variation of the spam stream. Note that in these tests, filters aren't given the full dataset immediately, they have to learn the new spam patterns as the test progresses. That's what you're talking about, and it's been done
Re: (Score:2)
This contest is testing filters on a live short window of time. What you want has already been done many times in the past (look up the work done by
NIST [nist.gov] for example).
I'm sorry but you have utterly misunderstood what I was saying or you don't understand the reference you linked to. The reference you link to is an on-line tracking filter for spam. The spam itself can vary or not, but it is not co-evolving in response to the filter itself which is what real spam does.
In the past, filters have been tested on spam data collected over literally a year or more, which captures the natural variation of the spam stream.
Now I'm certain you don't understand the difference between spam varying and spam co-evolving. In simple terms the first is game theory when you opponent does not c
Re: (Score:2)
There is no such thing as realtime coevolving spam in response to the filter. A filter doesn't give feedback to a spammer. There is no direct information path from the decision taken by a filter and the subsequent decisions taken by spammers on future spam campaigns. To believe there is is like believing in the tooth fairy.
Re: (Score:2)
Not flaming, just an observation based on my own experinces.
Re: (Score:2)
I cannot possibly come up with any viable justification for this. I can think of plenty of excuses and all of them rely on idiotic fallacies.
Exploiting This (Score:2)
I use a routine that can quickly determine the origin country of an IP address and just insert that origin country into the headers of the message in an X- header. Then, it's just one more thing for the Bayesian classifier to decide what to do with. It realizes that I don't get much ham from Latvia, so when it sees X-Origin-Country: Latvia, that spa
My money (Score:2)
Re: (Score:3, Funny)
Re: (Score:3, Funny)
Group spam detection (Score:5, Informative)
Gmail, like SpamCop, has a group spam filter system. It looks at mail sent to a large number of recipients. The defining characteristic of spam is that it's sent to a large number of recipients, after all. If you're in a position to watch the incoming mail of a few million mailboxes, detecting spam is easy.
Re: (Score:2)
Re: (Score:2)
Yeah- I'm waiting to see algorithmically generated spam where no two messages are alike.
We've had that for years. The latest variant is in those Viagra spams with a faint pattern of background noise in the images, different for each spam.
Re: (Score:2)
Re:Group spam detection (Score:5, Interesting)
I have received a spam to my gmail account exactly once. And when I did, shocked, I clicked the "mark as spam" button. The point is that this spam was probably sent to millions of Gmail users, and the algorithm wasn't sure how to categorize it. But because I clicked "spam" (and probably a few other people did, too), it was marked as spam for everyone. So most users never say it in their inbox. Thus only a dozen out of the million recipients was ever bothered by the spam. Conversely, an email list would receive no (or very few) "mark as spam" clicks, and would be allowed to pass. So basically the Gmail userbase acts the workforce to continually train the spam filter, and moreover to detect new spam within minutes of it being sent.
It's hard to beat a system like that. But the point is that it relies on the large number of users who are all (effectively) sharing their spam training sets with each other in realtime.
This is not to say that the baseline algorithm that Gmail implements isn't quite effective, but the point is that Gmail can use the users to resolve those tricky false-positive and false-negative situations.
Re: (Score:1)
I wonder how they deal with pseudo-spam (Score:2)
These aren't really spam, they are companies that I did business with once and can't be bothered to find my username and password to change my email subscription settings. But gmail seems to happily block everything else from that sender without my interaction.
Surely other users do want these particular emails so there must be some kind of per user dynamic as well.
Re: (Score:2)
This probably plays a role, but it will not be the only thing GMail relies on (and probably not even the most important factor), and it will likely require more than
Re: (Score:2)
I wish my Gmail account was like that. Maybe you're new to Gmail. I get several spams in my inbox per week. Mostly these are spam messages in Russian and Chinese but I still get a lot of spam in English as well. I always use the button to mark them as spam, but Gmail doesn't seem to get the message that I don't want anything written in Russian. It's also disappointing that I can't create a filter to mark messages as spam. The best I can do is cat
Re: (Score:3, Informative)
Everywebsite I have gets its own e-mail account, eg. slashdot@myhost.com.
One day I started getting spam to site@myhost.com. So I setup in dreamhost to bounce everything to that e-mail address.
Then I started getting flooded with:
otehoenut-site@myhost.com
cgjwbmkh-site@myhost.com
Google has, thankfully, let me do delete of *site@myhost.com, but for a
Gmail's filtering is not that great (Score:2)
Re: (Score:2)
I do see perhaps three spams a day that actually make it into the inbox, and about 300 or so that are shunted to the spam folder.
There may be false positives in there, but with 300 per day I'm not going to find out. I've never noticed one in there, or had a friend tel
Re: (Score:2)
Re: (Score:2)
You'll be much better off with a personal filter, that learns what you like, not what the majority of Gmail users like.
Re: (Score:2)
is on whatever Gmail uses. I've not yet seen a spam message in my inbox, nor have I missed any mail, even from auto-mailing scripts at websites I'm building...
I will agree that it's great for spam; but when it comes to 419 emails, it sucks. Badly. I'm not sure how I got on the 419ers lists, but I get at least 10-12 of them a day, none of which are caught by gmail filters. On the other hand, the 50-60 regular spam emails are correctly filtered. If only I could perform regex filtering in gmail, I could catch the 419 emails myself very easily, as they all have very common attributes.
Re: (Score:2)
Re: (Score:2)
If I could tell it to junk everything except text in certain languages it would work even better. It seems to miss a lot of Korean and Russian spam.
Sweeps (Score:3, Funny)
It think I've seen people catching spam on tv, just not the kind you're talkin' 'bout. http://www.spam.com/ [spam.com]
Re: (Score:1)
My money (Score:2)
Re: (Score:2)
Because you'd really want thousands of random people reading your emails looking for spam?
Damn. (Score:1)
Curious:When urologists email each other... (Score:5, Interesting)
Re:Curious:When urologists email each other... (Score:4, Informative)
"Ted, I just read the news about Viagra in the New England Journal of Medicine. Very interesting results, though the error bars are a bit large to draw any major conclusions just yet. What do you think?"
Whereas a doctor rarely writes email like:
"NoW ava ilable is generic V1AGRA at low price! Generic, quality, all low price now!"
The point is that modern spam filters don't just look for "bad words" but consider relative word frequencies, the sender and receiver fields, word correlations, formatting elements, URLs, etc. Spam filters in your email client will be trained against email you typically send/receive, and so can be even more precise. Spammers of course try to make their emails include words so that they end up looking like real email, but if the filter is good enough, then the only way to get past it is to send an email that now lacks those critical spam elements (like the link you're supposed to click to buy the generic drug or whatever)...
That depends upon the method used. (Score:2)
Other approaches use multiple tests such as checking whether the sending server's IP address is on a blacklist or whether any of the links in the message (should it contain links) were on blacklists.
Re: (Score:2)
Hey, I just pre sc ribed V.1.4.G.R.A to a patient today.
The monk said to the fox, why don't the squirrels to be or not to be, that is my answer. The fog was as thick as umbrellas in the wind thought the old maid.
Re: (Score:1)
Re: (Score:2)
Happened with a lame spam filter my company used to have. This was a year or so ago.
I emailed my wife "can you stop by and pick up the Strattera and Effexor from the pharmacy?" once. Her reply, containing my message, got plonked by the filters.
Re: (Score:2)
Spamassassin scored -1.3 (Score:2)
Subject: Interesting phenomon related to Viagra use
Hi, Dr. Smith-
I just wanted to write you to let you know that I really enjoyed the article you wrote in the New England Journal of Medicine about the side effects of Cialis, Viagra, and Levitra. It turns out a patient of mine experienced debilitating nausea while on Levitra, so I prescribed Viagra in its place, as you recommend.
In addition, I thought you might be interested to know
I wish the contest was.... (Score:2, Interesting)
The First Annual Greased Spammer Contest! (Score:5, Funny)
Re: (Score:2)
You forgot to mention that's it's being held on
SUNDAY! SUNDAY! SUNDAY!
Be There!
Re: (Score:2)
Will SMTP server settings count as well? (Score:2)
It's be interesting to see a solid setup that handles a combination of the two, then publish the results (yes, spammers can read those results/settings to try to foil the setup, but many settings would make it patently unprofitable for them to do so).
I can't tell from the write up. (Score:2)
A big part of the system I use at work is based upon IP addresses and rDNS. I block a HUGE amount of spam just by rejecting all connections from Comcast that aren't from their SMTP servers.
I know, some people want to run SMTP servers at home. But so far none of them have attempted to send email to my system.
So it really depends upon how they configure the test spam servers. Personally, I don't see this as being
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The prize list :) (Score:5, Funny)
2nd prize: Lifetime supply of Hormel meat products
3rd prize: Commemorative tin of SPAM meat product
Last place: Inheritance from Nigerian Prince
Re: (Score:2)
Which is about 4 1/2 days if that's all you eat.
that's easy. Yahoo mail! (Score:3, Funny)
Just open a yahoo mail account, and start posting with the e-mail address all over th internet.
You'll catch more spam than anyone else!
Oh, you want me to filter out spam, not just get spam, nevermind.
Still, it might be the fastest way to build a database of spam.
Re: (Score:2)
Re: (Score:2)
The poor bastard who actually does have CrazyTalk@verizon.net is really, really pissed about now.
Professional spammers in attendance? (Score:5, Interesting)
SpamAssassin? (Score:4, Interesting)
We use both SpamAssassin and OpenBSD's spamd, to great effect. spamd does most of the work, though. Daniel Hartmeier [benzedrine.cx] (site down ATM, unfortunately) has an example of how to tie SA scores back into spamd for blacklisting, which is just awesome. I'd implement it here, but our current setup is effective enough as to not make it worth my time.
Greylisting no longer works (Score:1)
The spammers just kept trying until they got through.
Spamming has evolved past greylisting and it is now worthless.
Bayesian keyword filtering is decent, but is constantly attacked by images or hiding the spa
Re: (Score:2)
It's certainly not perfect, but it reduces the load on my spam-filter. A *lot*. More than 90+% of smtp connections don't make it through spamd here. I hardly call that worthless.
Last year it was more like 99+%. Here's some stats from someone else last year: http://undeadly.org/cgi?action=article&sid=200602
Re: (Score:3, Interesting)
I'm certain that there are differences in implementation between different greylisters. I've never tried Postfix's, for example, because OpenBSD's works fine for me. A small point wrt to OpenBSD's spamd: you actually need to try thrice. The first time you're rejected. The second time you're marked as OK, but still rejected. The third time you get through. Maybe it's the third time, or some of the time limits, or some other thing
Re: (Score:2)
Mine is time-based, not rejection count based. In other words, if your IP isn't whitelisted, I do some tests on your IP to see how long you have to wait to get through.
First, I try to do a reverse DNS lookup on your IP. No result means I don't like your IP.
Then, I look to see if I can find your IP address anywhere in the reverse-DNS result (indicating a dynamic IP). If I find it forwards or backwards, I do
Flawed (Score:3, Informative)
This ought to be ignored as the contest is flawed.
"Ha ha, silly admin. My money's on greylisting."
They're sending a stream of spam from where? Sounds like a real mail server...
From TFA: "Live email stream, delivered by standard protocols (SMTP, IMAP, POP)"
[One wonders how else they would deliver e-mail if it was not from standard protocols. I also wonder how they plan on delivering e-mail using POP... The mind boggles...]
In any case if I read this
Re:Flawed (Score:4, Interesting)
I'd be very interested to hear of a design that would allow greylisting to be tested. The best I can come up with is to fail the message after transmission, then to try to simulate the behavior of the sender in response to this failure. But that would be catering to one very specific method of perturbing the protocol. And it would be necessary to do a fair amount of work to spoof the IP address presented to the participant filters.
For this reason, we chose to exclude all SMTP interactions, and simulate a second-in-the-chain filter appliance application. The reasons are practical, not policy.
Re: (Score:2)
Thanks for your response. I just sent your counterpart at IBM a lengthy probing e-mail about this which I can summarize as:
1. Real stream or fake stream?
2. Points for cost effectiveness?
3. Points for scalable/redundant architecture?
I applaud what you are doing and I wish you the best success (contests like this are good at stimulating inventiveness). I've been racking by brain trying to figure out how you could do this in a way that wouldn't be discriminatory. The best I could come up with woul
Re: (Score:2)
Yes, the spammer will always win, since his CPU cycles and bandwidth are free. But those costs don't matter at all.
Bayesian and other resource-inten
Re: (Score:2)
Fair comparative testing of spam control technologies is extremely difficult -- by some measures, it's impossible. Because some promising filter techniques rely on examining the real-time behaviour of the sending machine, it proves tricky to provide the exact same stream of email to all the filters at the same time.
For example, some filters attempt to fingerprint the sending machine's operating system -- the idea being that, say, a Windows 98 PC has no business submitting email direct-to-MX.
M
Re: (Score:2)
Here's why greylisting will continue to work in the real world:
1. If a spammer adopts RFC-compliant mailers, greylisting will prevent them from pumping out huge numbers of mails. They will have to burn CPU cycles on their end in order to push mail through. This increases the cost of sending mail, and reduces their margins since they will be hitting few
Re: (Score:2)
First, the contest will establish a baseline against which greylisting may be compared. It is much more difficult to measure false positive and false negative rates for intrusive techniques like greylisting and challenge-response. Too difficult to be done in an open competition. But the open competition can show what other techniques can do, and then there will be some onus on the greylisters and challenge-responders to show that their techniques really are a va
Re: (Score:2)
Re: (Score:2)
I'm not that impressed with SpamAssassin. Too much overhead in trying to keep all the static filtering rules up to date. Eventually, it get's dumb
The best spam filters I've seen in terms of effectiveness is bogofilter and dspam. Both of these are extensions of the Bayes statistical filtering.
bogofilter is awesome but it can't manage tokens from a database. Hence you can't have multiple machines very easily and users cannot share a database. Virtual hosting makes it harder and eventually you kind of
Why Not Use Both? (Score:2)
I use both, and I have to say that greylisting catches a metric boatload of spam. On the other hand, spammers have wised up and many are now retrying.
Sure does take a lot of load off of spamassassin, though.
West Virginia (Score:1)
Re: (Score:3, Funny)
Don't lie. You and your buddies got drunk and would go spam tipping. There was no hunting involved.
My entry: Human computers (Score:1)
To pay for it I'll be spamming the world with my stock pump-and-dump scheme.
This just in: DAVI (OTC) NOW $0.02 TARGET $0.25!
New packaging? (Score:3, Funny)
The cans were so much easier to catch, too.
Spam Rage Rampage (Score:2)
A couple of years ago, I wrote a prototype for a video game called "Spam Rage Rampage" -- a first-person shooter where you roamed a Tron-like world, killing spam zombies and rescuing real people (== legitimate mail) while you searched for clues to the location of the nefarious spam kingpin, Ospama Bin Sendin. Each zombie represented a different class of spam... prostitute zombies for porn, business-suited zombies for stocks, pharmacist zombies for pill ads, etc.
Upon seeing a demo, one of my friends commen
Greylisting? (Score:3, Insightful)
Re: (Score:1)
after a few minutes their email servers should reach critical mass.
Re: (Score:2)
Just pretend you're an admi
Boring. (Score:3, Funny)
To quote Bill Mattocks...
"My sense of personal integrity is none of your concern."
-thus spake Walt "Pickle Jar" Rines
"I'm going to pound your balls flat with a wooden mallet."
-thus respondeth Bill Mattocks
Re: (Score:2)
Kobayashi Maru (Score:3, Funny)
Find a creative and unique solution (cheat):
CEAS Call for Participation (Score:2)
Or the overview talk [youtube.com] that Rich Segal gave at the MIT Spam Conference.
The guidelines are scheduled to be finalized May 1.
Re: (Score:2)
Good job guys. The results will be interesting to read.
On ESPN... (Score:2)
Is there an ESPN 6 or 7 cable channel? I'm thinking this is below Cheerleading and Dog Agility, but perhaps above Lumberjack competitions.
Isn't this already on TV? (Score:3, Funny)
"This ought to be a sweeps week television spectacular."
I think that it already is, but it's only on in Japan and uses real SPAM.
Visions of tennis ball machine gone.... (Score:1)
I got a better idea (Score:2)
Evidence of wasted spammers can be in the form of complete heads, or ears.
From the not-from-a-dept dept. (Score:2)
Anyone want to try their hand at making up their own?
How to test against spam that isn't REAL spam? (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Just dump unsolicited email with URLs in them. (Score:2)
Now get people and free email services like Hotmail and Gmail to turn off their URL signatures in the bottom of their outgoing emails and you will stamp the spam email menace out in one bold stroke.
Moves the spam back to USENET which is already spammed-out already....
If people you don't know want to start a meaningful email conversation with you, they WON'T try to get you to visit the URL of some 'paysite' contained in their email.
Then something has to be done about spammers bouncing their
Error rate (false positives) isn't the whole story (Score:3, Insightful)
From TFCFP (call for participation):
Filters will be evaluated based on a weighted combination of the percentage of spam blocked and its false positive percentage.
From a theoretical standpoint, a low false positive average over an entire set (like <1%) might seem okay, but that doesn't take into account what's important to users.
Take, for example, a message from a long-lost friend, whose current address isn't yet in your whitelist, and who would have no other way of contacting you should the message get spamboxed. Here's an example of a message that's important to a user but gets lost among the everyday messages when simply talking about the percentage of false positives.
There's lots of other examples, too -- if you run your own domain, your messages are likely to be spamboxed, etc. Furthermore, the lower the false-positive rate, the less likely a user is to actually *check* their spambox, thus making a single false-positive even worse.
Microsoft's own Hotmail, of course, is notorious for spamboxing messages like that. And yet the conference is being held at Microsoft, and Microsoft's own spam researchers proudly touted their system in the February 2007 Communications of the ACM [acm.org].
Something tells me the leaders in the field are sort of missing the point. Simply bringing down the aggregate false positive rate is *not* enough. The measure needs to take into account how often the user actually misses information that's important to them.
Re:Error rate (false positives) isn't the whole st (Score:2)
A 1% false positive rate is not OK. The good systems will misclassify at most a couple of good emails per thousand, and the vast majority of those will lie in the grey area between ham and spam. A few will be internet transactions -- sign-up messages, receipts, and the like -- and a vanishingly small number will be personal communications.