Zebras Get Less Spam Than Aardvarks 115
MojoKid writes "A recent study (PDF) by Richard Clayton at Cambridge University determined that the first letter of a someone's email address directly affects how much spam they receive. As shown in the graph at either link above, email addresses with numbers as their first characters receive even fewer spam emails. The corpus used in the study was 8 weeks' worth of email from the UK ISP Demon Internet, just over half a billion messages, of which 56% was deemed to be spam."
You know what this means (Score:5, Insightful)
Re: (Score:2, Funny)
Hi. I note you don't publish your gmail address on /. Try that and then tell me about the spam you haven't seen.
Re: (Score:2)
No, they are combing webspace in general, which includes
Re:You know what this means (Score:5, Funny)
Why would spammers look for email addresses in their own working directory (./)? I guess I am just not up-to-date on my spamming techniques.
Re: (Score:1)
Why would spammers look for email addresses in their own working directory (./)?
It is not their own working directory, but spammers are indeed scanning directories for address books.
Re: (Score:2)
Re: (Score:3, Informative)
Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.
Re:You know what this means (Score:5, Informative)
Re: (Score:2)
I agree they catch virtually all spam, but the reason I stopped using gmail was I found their spam-filter was catching a fair few genuine emails also (and not just the ones from my dad talking about his Viagra usage).
Personally, I think spam filtering is an area where not junking genuine emails is far more crucial than catching a perfect 100% of spam emails.
Re: (Score:2)
What I am interested in knowing is what did replace gmail with? Who is better at this game?
Sorry to break this to you. (Score:3, Informative)
But in the real world there is no such thing as perfection. It is a philosophical construct.
Re: (Score:2)
So, since God has every perfection, he doesn't exist (except in Descartes' mind)?
Re: (Score:2)
Re: (Score:2)
I had Spam Assassin get it's first false positive ever, last week. It was an email from Microsoft.
Re:You know what this means (Score:4, Insightful)
Re: (Score:1)
Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.
Gmail's false positive rate for spam is so low that I don't even check anymore.
I'd much rather have to bother with a spam or two a day making its way to the inbox, than a legit mail or two a week making its way into the spam folder.
- RG>
Re: (Score:2)
Yeah, so that's a... ~0.00284% failure rate. That's amazingly good IMO.
BTW, does anyone know the Alt+Numpad code in Windows for the "about equals" sign? (The one that is a squiggly =, or two ~ stacked on top of one another like some hot font lovin'.)
Re: (Score:1)
Re: (Score:1)
That would be 0.284%, not 0.00284%. Nonetheless that's still great, but remember to move the decimal separator! :-P
Re: (Score:2)
Hm, I did:
100 / 35214 = 0.00283977blahblahblah
Ah, you're right. I did forget to move the decimal point over. Still, a 0.284% failure rate is really good!
Unsurprisingly, my math failure rate is actually much higher than that.
Re: (Score:2)
That depends on whether you work for Verizon or not.
Re: (Score:3, Interesting)
Unfortunately, gmail is my only mail account where I feel I have to scan the spam "folder" every week or so to look for false positives -- of which there are a couple a month. My other accounts, which receive more mail and more spam (both as a percentage and an absolute number), have given so few false positives that I don't bother looking in the spam folders on those accounts.
So, unfortunately, I end up looking at all the spam on gmail and just
Re: (Score:2)
I find gmail almost perfect at classifying spam as such.
My first thought is that the rest of your post disproves this very first statement. But after careful consideration, I'm realizing that you are saying that "if it's spam, gmail will see it as such" but this says nothing about non-spam. To which, I offer the following as perhaps the perfect SPAM filter. It will categorically mark all spam as such, and will even delete it for you. Perfect detection!
Just put the following into a file called ".forward" in your home directory, on a UNIX mail server:
"> /dev/nu
Re: (Score:2)
Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.
That's a 0.3% failure rate. My wife was good enough to marry, but she's no 9.997- still good enough for me.
Re: (Score:2)
I have half a dozen email addresses that have never received spam as well.
The trick is to never give your the email address to anyone, post with your email address on /. and see what happens.
Re: (Score:2)
I have half a dozen email addresses that have never received spam as well.
I have an infinite number of email addresses that have never received spam!
I used to have two infinities, but one of them started getting a significant finite number of addresses receiving spam, so I had to remove the catch-all for that domain. So much spam that my ISP had disabled my procmail filter for using too many system resources filtering my incoming spam.
Re: (Score:2)
So much spam that my ISP had disabled my procmail filter for using too many system resources filtering my incoming spam.
My service provider shut down my catch-all for the excessive levels of spam,
catch-all + Spam Assassin = Fun
Re: (Score:3, Insightful)
but, like the article says, there are fewer people whose e-mail addresses start with z or numbers. so they'd be getting fewer hits by targeting those starting characters. there's already more spam messages being targeted at "zebras" per legitimate target than there are spam messages being targeted at aardvark addresses.
so the smart thing for spammers to do is to stop wasting time with zebra addresses, since they'd have a higher chance of actually reaching a real mailbox by targeting more popular character r
Re: (Score:3, Interesting)
I think this might be a distant relative to Benford's law (the one that shows that about 30% of all counted numbers will start with the digit "1", not 10% as one might think).
Going through some crack and john-the-ripper logs, I saw that there was a good correlation between the position in the alphabet not only for the passwords, but also for the user names.
Based on pure letter frequency, you'd think that there would be a typical E-R-S-T-N ranking, but this doesn't appear to be the case for the initial lette
Re: (Score:1)
> Spammers will now alter their programs to start
> with "z" and numbers, so they can get the
> people who aren't as desensitized by spam.
A or Z, just be glad your name isn't Lorenzo.
Damn (Score:1)
I guess that explains why I seem to get more spam than most.
Re: (Score:2)
Re: (Score:2)
Brute force attacks also start at 'a' - always. I have never seen one that starts at the end of the aphabet.
I do all my brute force attacks with the California recall election alphabet: R, W, Q, O, J, M, V, A, H, B, S, G, Z, X, N, T, C, I, E, K, U, P, D, Y, F, L.
"Now I know my R, W, Qs. Next time won't you sing with muse."
Unexpected (Score:2)
I can see why they'd start at the front of the alphabet, and why those folks would tend to get more spam.
But wouldn't numbers sort even in front of the letter A?
Re: (Score:3, Interesting)
I'd guess that addresses with numbers at the beginning are often invalid, so they don't bother with them. I get spam attempts addressed to message IDs, which are generally something like 238947529345user@example.com.
Re: (Score:1)
I commented on this a few days ago - but I too see many many delivery attempts to email-like message IDs.
I guess that means that somebody has a compromised machine which is being crawled, or perhaps a mailing list archive online has them visible.
Highly irritating, but pretty easy to block.
Re: (Score:3, Insightful)
I think most of the spam targeted at a message ID comes from crawling USENET.
On my server, I see lots of e-mail with a "rcpt to:" that matches the regex "(mpg\.)?[a-f0-9]+\@news\.domain\.com". This is the format that inn uses to create message IDs.
Re: (Score:2)
I think most of the spam targeted at a message ID comes from crawling USENET.
I've had hundreds of spam attempts at my message IDs from Usenet postings, I get more delivery attempts to message IDs than to my real email address I post with.
I bet this guy gets the least amount of spam (Score:2, Interesting)
We had an ex-city manager, who had a son named Zachary Z. Zoul.
Re:I bet this guy gets the least amount of spam (Score:5, Funny)
Did the city manager get fired because every time anyone tried to talk to him about city management, he would say, "There is no city manager, only Zoul"?
I'm so sorry.
Re: (Score:2, Funny)
Re: (Score:2)
Six feet above the covers!
This is silly (Score:4, Funny)
Re: (Score:1)
Are they Jewish?
Re: (Score:2)
Are they Jewish?
The aardvark is. Where do you think the double "a" came from?
(Disclaimer: I am Jewish)
Re: (Score:2)
I looked it up, and yes, I'm right [etymonline.com]. Not sure if it was some amazingly funny ironic Jewish joke that I don't get, but hey, in case it's not, we've all been educated.
Re: (Score:2)
No, it was me posting at an hour that I should have been sleeping. The only other English word that I know that begins with two "a"'s is the name Aaron, which is a Hebrew name.
The wife is right. I'm not funny.
What? (Score:5, Informative)
The conclusion is ridiculous. There's more spam for addresses starting with 'a' than with 'z' because there is more traffic to those addresses. See the the graph [hothardware.com]. The line in the graph is the only solid piece of information, and it is just a lot of noise around the mean value of 56%; if anything, it indicates the opposite conclusion.
Re: (Score:3, Insightful)
Re: (Score:2, Funny)
Zed's dead baby, Zed's dead.
Re: (Score:2)
Zed's dead baby, Zed's dead.
Is that why he's not answering the Viagra mails?
Re:What? (Score:5, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
According to the PDF, this graph is for all email addresses, not for 'real' addresses, which they define, more or less, as those addresses which receive at least one non-spam email every other day. Since they are looking only at Demon's logs, not the contents of actual mailboxes, they have to use this heuristic to filter out the bogus combinations that the spammers are trying.
If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A
Re: (Score:2)
If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A addresses than for Z addresses, as asserted in the summary.
But probably not significantly higher (the difference being greater than the noise). Again, the data tells us nothing interesting.
Re: (Score:2)
You are right.
Additionally, everyone knows that spammers don't sort by the entire e-mail address, they group by the target domain name anyway for one of two reasons:
1. to efficiently send bulk e-mail to one server for a lot of users.
2. when trying to be stealthy, to spread out the e-mail to one mail server over time.
Sure, you could sort e-mail addresses to weed out duplicates but ultimately you have to group them by domain name.
Re: (Score:1)
There being less spam naturally implies there is less traffic. Most traffic is spam; so if there were no spam, there would be very little traffic.
Legitimate messages VS spam messages are completely different and non-spam traffic really has nothing to do with the total spam volume, except as a comparison tool; the traffic numbers that actually have to do with the spam problem are the number of spam messages.
It's spam volume that really matters. If I get 10 messages, and 9 of them are spam, i'm much hap
Re: (Score:2)
There being less spam naturally implies there is less traffic.
Most traffic is spam; so if there were no spam, there would be very little traffic.
Look at the graph. Spam accounts for around 60% of the traffic for all groups of email addresses, regardless of total number of messages. My point is, it's likely that there are far more addresses starting with 'a' than with 'z', and that's all the data tells us. There is no proof of correlation between starting letter of the address and amount of spam.
Re: (Score:2)
Sign up to the LKML, and presto! Most of your mailbox is now non-spam ;)
Re: (Score:1)
It's spam volume that really matters. If I get 10 messages, and 9 of them are spam, i'm much happier than if I get 1000 messages and 500 of them are spam.
On the other hand, if I got 1000 messages, I'd be much happier if 999 of them were spam than if only 500 were spam.
Re: (Score:1)
I think that would suck.. I wouldn't be able to tell for sure that only one (unimportant) message was legit ham without examining and discarding each of the 999 spam messages.
Re: (Score:1)
Re: (Score:2)
There is basically no noise at all. The sample size for "z" is about 5 million emails.
So you think that 5 million is, magically, a "safe" sample size. For anything, using any method of measurement with or without deficiencies. Yeah, sure, why not.
Even if it was, the total volume of spam is not a measure of anything significant. The volume of spam relative to the total volume is, which the line represents. And it shows no decrease as you move to the right, contrary to what they claim.
Very little spam at demon.uk (Score:3, Interesting)
I thought most in the know see a far higher percentage, my ISP records over 95%:
Xs4all statistics [xs4all.nl]
Makes me wonder about the rest.
Re: (Score:3, Insightful)
Indeed, the PDF paper says this is measuring the rate of filtering AFTER using Spaumhaus black holes, and the measured rate is their custom "Cloudmark" spam detection tool. Importantly, if their tool sucks enough that people opt out of it entirely, all email is considered "not-spam". But as long as these effects are not influenced by the first letter, that's okay.
Unfortunately, the paper tries very hard to present a very silly notion about 'a' versus 'z'. The important concept here isn't order, it's letter
The f*** article says otherwise (Score:5, Informative)
I know nobody actually bothered to read it, but from the graph it looks like there are much more email addresses starting with an "a" than with a "z". The former get about as much spam as legit emails, while the latter get about 2 or 3 times more spam than legit emails.
Re: (Score:1, Informative)
This is correct. And according to the Article, the Wallaby gets the lowest percentage of spam while the Zebra gets about the highest percentage of spam.
My domains start with a (Score:3, Insightful)
Re: (Score:1)
Re: (Score:1, Troll)
Re: (Score:2)
nobody ever expects the non-alphanumeric character.
only 56%? (Score:2)
I get more like 98% spam.
cat email_list.txt |sort -u | ./sendspam.pl (Score:2)
Signal to Noise ratio (Score:3, Insightful)
Yes, the beginning of the alphabet gets more spam. (Score:5, Informative)
Spammers work from lists of email addresses, and those email addresses are typically sorted by domain and then alphabetically. So, the receiving domain gets a rush of emails for users with addresses beginning with A, B, C etc. But usually (at some point) many mail systems will detect that there is a spam attack in progress and they will block subsequent messages of the same format or from the attacking IP address (depending on the spam filtering setup in place).
So, but simply the people beginning with "A" get nice new spam that the adaptive filters don't detect. By the time it gets to "Z" a good filter will automatically block the attack.
What's sad is that I watch spam attacks often enough to know this.
Misleading, and wrong (Score:1)
Am I the only one who actually paid attention to the graph? Yes, sure, the "overall" spam amounts for A is much bigger than for Z, but that is because there are more email addresses that start with the letter A than those starting with the letter Z. If not, check how spam/non-spam percentages all stay withing the 50%-75% interval.
In other words: more email messages = more spam, "OH GOD!!11ONE let's write a paper!"
Slow news day?
That explains a lot (Score:2)
I rarely get spammed, even on hotmail
56% sounds low (Score:1)
My servers get at least 95% identified (and blocked) spam vs. ham across ~200 domains that have different characteristics. And i know that some of them slip through the filters also.
Spam? (Score:1)
Re: (Score:2)
My sentiments exactly. I have this username as an account name on another system, and I honestly get about one spam a week to that account. Maybe there's something to this z thing.
Zzwyggle (Score:2)
With two z's.
"But my name really is ZZwyggle!"
Oh really? (Score:3, Interesting)
I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)
Now, let's see if that holds true.
Re: (Score:2)
I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)
And posting it on the web is a great idea to avoid spam.
Re: (Score:2)
My post was suppose to be on the humour side. I am surprised it was moderated as interesting. More importantly, if people had some random characters in their e-mail addresses, there would be a less likely chance of getting hit by those using dictionary-based attacks.
As of right now, no mail has came in.
Re: (Score:2)
Me, on bad phone line: My email address is z;45-r_0oisdgf*yihh@hotmail.com ... ;, as in the symbol, 4 5 - r _ ... ....
Person: Wtf??? Can I read that back.
Person: zcolonfortea5
Me, interrupting: No, hang on it's z
Person, repeating as we go: z ; 4 5 dash r underscore
Result — I never get ham because noone can ever transcribe my address.
Re: (Score:2)
That's the whole point: he needs it to be harvested.
Re: (Score:2)
You must be joking or something. I said: zh80lukgwggok4kko0kcbrhjm@hotmail.com
I did register it. I can log into it. I see someone has already sent me spam, but I dare not click it.
Re: (Score:2)
I sent you some spam as well, I guess this disproves the article :)
Even spammers know.. (Score:2, Funny)
Zebras already have big penises!
Ptooey! I SPIT on your spam cowardice (Score:1)
I worship the letter "A" and sneer at these pantywaist spammers!
Aardvarks will NEVER give in to threats of spam!
Nor we aardwolves!
.
Well hung zebras... (Score:1)
soon to be known as zealous zebra (Score:2)
So that explains it!
- arbitrary aardvark.
+3 funny, sad.
Am I the only one... (Score:1)