Slashdot Log In
Zebras Get Less Spam Than Aardvarks
Posted by
kdawson
on Sun Aug 31, 2008 02:18 PM
from the so-do-johns-and-smiths dept.
from the so-do-johns-and-smiths dept.
MojoKid writes "A recent study (PDF) by Richard Clayton at Cambridge University determined that the first letter of a someone's email address directly affects how much spam they receive. As shown in the graph at either link above, email addresses with numbers as their first characters receive even fewer spam emails. The corpus used in the study was 8 weeks' worth of email from the UK ISP Demon Internet, just over half a billion messages, of which 56% was deemed to be spam."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
You know what this means (Score:5, Insightful)
Re: (Score:3, Insightful)
but, like the article says, there are fewer people whose e-mail addresses start with z or numbers. so they'd be getting fewer hits by targeting those starting characters. there's already more spam messages being targeted at "zebras" per legitimate target than there are spam messages being targeted at aardvark addresses.
so the smart thing for spammers to do is to stop wasting time with zebra addresses, since they'd have a higher chance of actually reaching a real mailbox by targeting more popular character r
Re: (Score:3, Interesting)
I think this might be a distant relative to Benford's law (the one that shows that about 30% of all counted numbers will start with the digit "1", not 10% as one might think).
Going through some crack and john-the-ripper logs, I saw that there was a good correlation between the position in the alphabet not only for the passwords, but also for the user names.
Based on pure letter frequency, you'd think that there would be a typical E-R-S-T-N ranking, but this doesn't appear to be the case for the initial lette
Re: (Score:2, Funny)
Hi. I note you don't publish your gmail address on /. Try that and then tell me about the spam you haven't seen.
Re: (Score:2)
Re: (Score:2)
No, they are combing webspace in general, which includes
Re:You know what this means (Score:5, Funny)
Why would spammers look for email addresses in their own working directory (./)? I guess I am just not up-to-date on my spamming techniques.
Parent
Re: (Score:3, Informative)
Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.
Re:You know what this means (Score:5, Informative)
Parent
Sorry to break this to you. (Score:3, Informative)
But in the real world there is no such thing as perfection. It is a philosophical construct.
Re:You know what this means (Score:4, Insightful)
Parent
Re: (Score:2)
Yeah, so that's a... ~0.00284% failure rate. That's amazingly good IMO.
BTW, does anyone know the Alt+Numpad code in Windows for the "about equals" sign? (The one that is a squiggly =, or two ~ stacked on top of one another like some hot font lovin'.)
Re: (Score:3, Interesting)
Unfortunately, gmail is my only mail account where I feel I have to scan the spam "folder" every week or so to look for false positives -- of which there are a couple a month. My other accounts, which receive more mail and more spam (both as a percentage and an absolute number), have given so few false positives that I don't bother looking in the spam folders on those accounts.
So, unfortunately, I end up looking at all the spam on gmail and just
Re: (Score:2)
Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.
That's a 0.3% failure rate. My wife was good enough to marry, but she's no 9.997- still good enough for me.
Unexpected (Score:2)
I can see why they'd start at the front of the alphabet, and why those folks would tend to get more spam.
But wouldn't numbers sort even in front of the letter A?
Re: (Score:3, Interesting)
I'd guess that addresses with numbers at the beginning are often invalid, so they don't bother with them. I get spam attempts addressed to message IDs, which are generally something like 238947529345user@example.com.
Re: (Score:3, Insightful)
I think most of the spam targeted at a message ID comes from crawling USENET.
On my server, I see lots of e-mail with a "rcpt to:" that matches the regex "(mpg\.)?[a-f0-9]+\@news\.domain\.com". This is the format that inn uses to create message IDs.
I bet this guy gets the least amount of spam (Score:2, Interesting)
We had an ex-city manager, who had a son named Zachary Z. Zoul.
Re:I bet this guy gets the least amount of spam (Score:5, Funny)
Did the city manager get fired because every time anyone tried to talk to him about city management, he would say, "There is no city manager, only Zoul"?
I'm so sorry.
Parent
Re: (Score:2, Funny)
Re: (Score:2)
Six feet above the covers!
This is silly (Score:4, Funny)
Re: (Score:2)
Are they Jewish?
The aardvark is. Where do you think the double "a" came from?
(Disclaimer: I am Jewish)
What? (Score:5, Informative)
The conclusion is ridiculous. There's more spam for addresses starting with 'a' than with 'z' because there is more traffic to those addresses. See the the graph [hothardware.com]. The line in the graph is the only solid piece of information, and it is just a lot of noise around the mean value of 56%; if anything, it indicates the opposite conclusion.
Re:What? (Score:5, Insightful)
Parent
Re: (Score:3, Informative)
According to the PDF, this graph is for all email addresses, not for 'real' addresses, which they define, more or less, as those addresses which receive at least one non-spam email every other day. Since they are looking only at Demon's logs, not the contents of actual mailboxes, they have to use this heuristic to filter out the bogus combinations that the spammers are trying.
If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A
Re: (Score:2)
If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A addresses than for Z addresses, as asserted in the summary.
But probably not significantly higher (the difference being greater than the noise). Again, the data tells us nothing interesting.
Re: (Score:2)
You are right.
Additionally, everyone knows that spammers don't sort by the entire e-mail address, they group by the target domain name anyway for one of two reasons:
1. to efficiently send bulk e-mail to one server for a lot of users.
2. when trying to be stealthy, to spread out the e-mail to one mail server over time.
Sure, you could sort e-mail addresses to weed out duplicates but ultimately you have to group them by domain name.
Re: (Score:3, Insightful)
Re: (Score:2, Funny)
Zed's dead baby, Zed's dead.
Re: (Score:2)
There being less spam naturally implies there is less traffic.
Most traffic is spam; so if there were no spam, there would be very little traffic.
Look at the graph. Spam accounts for around 60% of the traffic for all groups of email addresses, regardless of total number of messages. My point is, it's likely that there are far more addresses starting with 'a' than with 'z', and that's all the data tells us. There is no proof of correlation between starting letter of the address and amount of spam.
Re: (Score:2)
Sign up to the LKML, and presto! Most of your mailbox is now non-spam ;)
Re: (Score:2)
There is basically no noise at all. The sample size for "z" is about 5 million emails.
So you think that 5 million is, magically, a "safe" sample size. For anything, using any method of measurement with or without deficiencies. Yeah, sure, why not.
Even if it was, the total volume of spam is not a measure of anything significant. The volume of spam relative to the total volume is, which the line represents. And it shows no decrease as you move to the right, contrary to what they claim.
Very little spam at demon.uk (Score:3, Interesting)
I thought most in the know see a far higher percentage, my ISP records over 95%:
Xs4all statistics [xs4all.nl]
Makes me wonder about the rest.
The f*** article says otherwise (Score:5, Informative)
I know nobody actually bothered to read it, but from the graph it looks like there are much more email addresses starting with an "a" than with a "z". The former get about as much spam as legit emails, while the latter get about 2 or 3 times more spam than legit emails.
My domains start with a (Score:3, Insightful)
Re: (Score:2)
nobody ever expects the non-alphanumeric character.
only 56%? (Score:2)
I get more like 98% spam.
cat email_list.txt |sort -u | ./sendspam.pl (Score:2)
Signal to Noise ratio (Score:3, Insightful)
Yes, the beginning of the alphabet gets more spam. (Score:5, Informative)
Spammers work from lists of email addresses, and those email addresses are typically sorted by domain and then alphabetically. So, the receiving domain gets a rush of emails for users with addresses beginning with A, B, C etc. But usually (at some point) many mail systems will detect that there is a spam attack in progress and they will block subsequent messages of the same format or from the attacking IP address (depending on the spam filtering setup in place).
So, but simply the people beginning with "A" get nice new spam that the adaptive filters don't detect. By the time it gets to "Z" a good filter will automatically block the attack.
What's sad is that I watch spam attacks often enough to know this.
That explains a lot (Score:2)
I rarely get spammed, even on hotmail
Zzwyggle (Score:2)
With two z's.
"But my name really is ZZwyggle!"
Oh really? (Score:3, Interesting)
I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)
Now, let's see if that holds true.
Re: (Score:2)
I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)
And posting it on the web is a great idea to avoid spam.
Re: (Score:2)
My post was suppose to be on the humour side. I am surprised it was moderated as interesting. More importantly, if people had some random characters in their e-mail addresses, there would be a less likely chance of getting hit by those using dictionary-based attacks.
As of right now, no mail has came in.
Even spammers know.. (Score:2, Funny)
Zebras already have big penises!
Re: (Score:2)
Re: (Score:3, Insightful)
Indeed, the PDF paper says this is measuring the rate of filtering AFTER using Spaumhaus black holes, and the measured rate is their custom "Cloudmark" spam detection tool. Importantly, if their tool sucks enough that people opt out of it entirely, all email is considered "not-spam". But as long as these effects are not influenced by the first letter, that's okay.
Unfortunately, the paper tries very hard to present a very silly notion about 'a' versus 'z'. The important concept here isn't order, it's letter
Re: (Score:2)
My sentiments exactly. I have this username as an account name on another system, and I honestly get about one spam a week to that account. Maybe there's something to this z thing.