Security Predictions of 2004 326
scubacuda writes "Computer World's security predictions for 2004: R.a..n,d,o.,m p,u,,n,c.t,,u_a.t.1..0.n evading spam filters, Internet access filtering, better desktop management, enterprise personal firewall deployment, tools that securely scrub metadata, corporate policies against USB flash drives, Wi-Fi break-ins, Bluetooth abuses, cell phone hacking, centralized control over IM, public utility breakin publicized, government defense against cybercriminals, organized cybercrime, and a shorter time to exploitation."
Random punctuation (Score:4, Informative)
bayesian filters aren't fooled so easily (Score:5, Informative)
Re:What I encountered yesterday (Score:4, Informative)
Re:Nearly impossible? (Score:2, Informative)
My solution to the punctuation and l33t-speak type spams is simply to run the incoming message through a spell checker.
Whilst lots of people make typos and use words not in my dictionary it does become obvious when the spelt-wrong/spelt-correctly ratio is high that it's likely spam.
Anti-Obfuscation script (Score:4, Informative)
- cnb
Re:Spam Spam Defeatable Spam (Score:5, Informative)
- Message text disguised using base64 encoding
- Uses a numeric IP address in URL
- Uses a dotted-decimal IP address in URL
- HTML has over 9 kilopixels of images
- HTML: images with 0-200 bytes of words
- HTML has a low ratio of text to image area
- The score from a bayesian filter, which would probably quickly increase for messages with tons of punctuation and still leave legit mail since you normally don't use tons of punctuation.
Spam operators might get more creative, but I still think spam removal tools are several steps ahead.
Re:Spam Spam Defeatable Spam (Score:5, Informative)
I get 30-120 spam a day. (old account).
Checking with my spamassassin filter, I see that it's bayesian filter is happy with 1,868,996 pieces of spam, and 386 pieces of ham (the good stuff, stuff I want to keep).
I get maybe 1 spam thru to my normal inbox a month. Which I happily feed to the sa-learn tool (spamassassin's bayesian learning tool).
I don't need any wacky products installed in my email client (which I change often).
I access my email via imap over ssl.
I use mozilla mail mostly, but have used mutt, outlook, pine, outlook express, kmail, and a large amount of others (that I've forgotten about now), all with spamassassin running happily on the mail server churning thru all incoming email.
our mail server handles 4000-10,000 pieces of email a day for all our accounts, and spamassassin barely registers as a 'blip' on our cpu usage radar.
It's really sweet.
Oh yeah, I've had only 1 false positive, and it was due to a wise-ass friend that decided to send a piece of conversational email disguised as spam from a new email address.
Re:On random punctuation (Score:3, Informative)
It handles this case correctly. There is actually some extra code I added to handle cases like this (specifically the word "scrape").
Basically the regexp is modified so it only matches at either the beginning or the end of a word, using word boundary matching. Not completely ideal, but good enough.
Rich.
Re:Nearly impossible? (Score:3, Informative)
Regards,
Steve
Re:Nearly impossible? (Score:1, Informative)
Personally, I've been using SpamBayes (spambayes.sourceforge.net) and it's been working beautifully.
I used SpamNet (cloudmark.com) when it was free and was blown away it's accuracy. It's a p2p spam tracking network (so you let a community of humans decide what's spam, not filtering rules). Course, now they charge you to be a part of the community, but it's worth a look...
Re:Nearly impossible? (Score:3, Informative)
gives you a list of the misspelled word. You could fiddle with the capitalization rules for things like DNS, DHCP, TCP/IP etc. to lower your false positives.
We could wrap that into spamd and generate a weighted score. Problems would be speed of course as ispell would have to start up each time to check an email (is there a daemon mode for ispell or aspell?)
Anyway, I ran it on a bunch of aforementioned spam and it gives convincing results.
Of course, slashdotters would probably rate a lot of false positives, so maybe we shouldn't push this until we better our spelling.
Re:What I encountered yesterday (Score:2, Informative)
So Far, So Good [paulgraham.com]
The more they try to fool the filter, the better the filter becomes at recognizing this sort of "random" word placement. Interesting read.
Comment removed (Score:2, Informative)
dictionary words in bare mime part (Score:3, Informative)
conduit horse house press lingo technical gelatin overlord brown uniform
In the muli-media portion you'll see spam like never before.
How to stop these? You can't train a bayes database with dictionary words as it would eventually defang the whole method. Your only option I suppose would be to compare the contents of the multi-media portion with the 7-bit ASCII portion and see if they match. Problem here is to make the comparison fuzzy enough to allow for multi-byte characters and stuff like that.
The words thing about this type of spam is that at best your bayes database is circumvented, but at worst it is trained to see good words as bad or bad words as good and is rendered useless.
With SpamAssassin it is easy to set when to auto-train your bayes backend and when not to. I have my required_hits option set to '4.0' so I would use the following settings;
use_bayes 1
auto_learn 1
auto_learn_threshold_spam 7
auto_learn_threshold_nonspam -5.5
With this I am reasonably confident that I am not training my bayes database with good words as bad unless it really is found to be spam impirically, and inverse unless I am sure it's a good e-mail, typically by means of AWL or whitelist_from.
If anybody has solved this, I would be very grateful to hear what you did and how you did it.