Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Zebras Get Less Spam Than Aardvarks

Posted by kdawson on Sun Aug 31, 2008 02:18 PM
from the so-do-johns-and-smiths dept.
MojoKid writes "A recent study (PDF) by Richard Clayton at Cambridge University determined that the first letter of a someone's email address directly affects how much spam they receive. As shown in the graph at either link above, email addresses with numbers as their first characters receive even fewer spam emails. The corpus used in the study was 8 weeks' worth of email from the UK ISP Demon Internet, just over half a billion messages, of which 56% was deemed to be spam."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Shajenko42 (627901) on Sunday August 31 2008, @02:20PM (#24821801)
    Spammers will now alter their programs to start with "z" and numbers, so they can get the people who aren't as desensitized by spam.
    • but, like the article says, there are fewer people whose e-mail addresses start with z or numbers. so they'd be getting fewer hits by targeting those starting characters. there's already more spam messages being targeted at "zebras" per legitimate target than there are spam messages being targeted at aardvark addresses.

      so the smart thing for spammers to do is to stop wasting time with zebra addresses, since they'd have a higher chance of actually reaching a real mailbox by targeting more popular character r

      • Re: (Score:3, Interesting)

        I think this might be a distant relative to Benford's law (the one that shows that about 30% of all counted numbers will start with the digit "1", not 10% as one might think).

        Going through some crack and john-the-ripper logs, I saw that there was a good correlation between the position in the alphabet not only for the passwords, but also for the user names.
        Based on pure letter frequency, you'd think that there would be a typical E-R-S-T-N ranking, but this doesn't appear to be the case for the initial lette

      • Re: (Score:2, Funny)

        by Anonymous Coward

        Hi. I note you don't publish your gmail address on /. Try that and then tell me about the spam you haven't seen.

        • What is this spam you speak of? I'm using Gmail and todate I haven't seen a single instance of this so called "spam".
          • So, its not gmail, but not displaying your email address on ./ does the trick, eh? I am sure all spammers are just combing ./ for email addresses.

            No, they are combing webspace in general, which includes /.. As well as dumb dictionary lists.
          • by AngryLlama (611814) on Sunday August 31 2008, @04:05PM (#24822769)

            Why would spammers look for email addresses in their own working directory (./)? I guess I am just not up-to-date on my spamming techniques.

      • Re: (Score:3, Informative)

        Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.

        • by E IS mC(Square) (721736) on Sunday August 31 2008, @02:55PM (#24822119) Journal
          Nothing is perfect when it comes to this. But they are the best among all 'free' email providers I have used - by miles. Now get in and flag them as spam - next time, you may receive fewer.
        • But in the real world there is no such thing as perfection. It is a philosophical construct.

           

        • by cypherwise (650128) on Sunday August 31 2008, @03:37PM (#24822465) Journal
          I'm (incorrectly?) assuming this comment was facetious. 100/35,214 (that's 99.71%) is a pretty damn good ratio when it comes to this type of thing.
        • Yeah, so that's a... ~0.00284% failure rate. That's amazingly good IMO.

          BTW, does anyone know the Alt+Numpad code in Windows for the "about equals" sign? (The one that is a squiggly =, or two ~ stacked on top of one another like some hot font lovin'.)

        • Re: (Score:3, Interesting)

          I find gmail almost perfect at classifying spam as such.

          Unfortunately, gmail is my only mail account where I feel I have to scan the spam "folder" every week or so to look for false positives -- of which there are a couple a month. My other accounts, which receive more mail and more spam (both as a percentage and an absolute number), have given so few false positives that I don't bother looking in the spam folders on those accounts.

          So, unfortunately, I end up looking at all the spam on gmail and just
        • Gmail isn't perfect at filtering spam. I've received 35,214 spam messages in the last month. I estimate that Gmail failed to filter around 100 of them.

          That's a 0.3% failure rate. My wife was good enough to marry, but she's no 9.997- still good enough for me.

  • I can see why they'd start at the front of the alphabet, and why those folks would tend to get more spam.

    But wouldn't numbers sort even in front of the letter A?

    • Re: (Score:3, Interesting)

      I'd guess that addresses with numbers at the beginning are often invalid, so they don't bother with them. I get spam attempts addressed to message IDs, which are generally something like 238947529345user@example.com.

        • Re: (Score:3, Insightful)

          I think most of the spam targeted at a message ID comes from crawling USENET.

          On my server, I see lots of e-mail with a "rcpt to:" that matches the regex "(mpg\.)?[a-f0-9]+\@news\.domain\.com". This is the format that inn uses to create message IDs.

  • by Anonymous Coward

    We had an ex-city manager, who had a son named Zachary Z. Zoul.

  • by knappe duivel (914316) on Sunday August 31 2008, @02:32PM (#24821917)
    Zebra's and aardvarks don't eat Spam. Or ham.
  • What? (Score:5, Informative)

    by pablomme (1270790) on Sunday August 31 2008, @02:34PM (#24821935)

    The conclusion is ridiculous. There's more spam for addresses starting with 'a' than with 'z' because there is more traffic to those addresses. See the the graph [hothardware.com]. The line in the graph is the only solid piece of information, and it is just a lot of noise around the mean value of 56%; if anything, it indicates the opposite conclusion.

    • Re:What? (Score:5, Insightful)

      by Oidhche (1244906) on Sunday August 31 2008, @02:52PM (#24822081)
      Indeed. The conclusion that I'd draw from presented data is that there are more e-mail addresses beginning with 'a' than with 'z' (and that very few addresses begin with a number). Even the percentage of spam is nearly meaningless. To find anything about which addresses receive more spam, you should look at the average amount of spam per e-mail address in a given group, not the total number of messages.
    • Re: (Score:3, Informative)

      According to the PDF, this graph is for all email addresses, not for 'real' addresses, which they define, more or less, as those addresses which receive at least one non-spam email every other day. Since they are looking only at Demon's logs, not the contents of actual mailboxes, they have to use this heuristic to filter out the bogus combinations that the spammers are trying.

      If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A

      • If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A addresses than for Z addresses, as asserted in the summary.

        But probably not significantly higher (the difference being greater than the noise). Again, the data tells us nothing interesting.

    • You are right.

      Additionally, everyone knows that spammers don't sort by the entire e-mail address, they group by the target domain name anyway for one of two reasons:

      1. to efficiently send bulk e-mail to one server for a lot of users.

      2. when trying to be stealthy, to spread out the e-mail to one mail server over time.

      Sure, you could sort e-mail addresses to weed out duplicates but ultimately you have to group them by domain name.

      • Re: (Score:3, Insightful)

        No. Look at the data. It shows the total amount of messages received by Alberts and Zeds. It's painfully obvious that Alberts receive far more of both spam and genuine messages than Zeds. Not because the average Albert gets more messages than the average Zed, but because there are more Alberts than Zeds.
      • There being less spam naturally implies there is less traffic.
        Most traffic is spam; so if there were no spam, there would be very little traffic.

        Look at the graph. Spam accounts for around 60% of the traffic for all groups of email addresses, regardless of total number of messages. My point is, it's likely that there are far more addresses starting with 'a' than with 'z', and that's all the data tells us. There is no proof of correlation between starting letter of the address and amount of spam.

      • Sign up to the LKML, and presto! Most of your mailbox is now non-spam ;)

      • There is basically no noise at all. The sample size for "z" is about 5 million emails.

        So you think that 5 million is, magically, a "safe" sample size. For anything, using any method of measurement with or without deficiencies. Yeah, sure, why not.

        Even if it was, the total volume of spam is not a measure of anything significant. The volume of spam relative to the total volume is, which the line represents. And it shows no decrease as you move to the right, contrary to what they claim.

  • by Teun (17872) on Sunday August 31 2008, @02:36PM (#24821953) Homepage
    56% percent deemed spam?

    I thought most in the know see a far higher percentage, my ISP records over 95%:

    Xs4all statistics [xs4all.nl]

    Makes me wonder about the rest.

  • by paulatz (744216) on Sunday August 31 2008, @02:54PM (#24822095) Homepage

    I know nobody actually bothered to read it, but from the graph it looks like there are much more email addresses starting with an "a" than with a "z". The former get about as much spam as legit emails, while the latter get about 2 or 3 times more spam than legit emails.

  • by flyingfsck (986395) on Sunday August 31 2008, @03:07PM (#24822201)
    and yes they get tons of spam, about 99.999% of connection attempts are spam, but a couple of RBLs and Spam Assassin takes care of it. If I turn the protection off, then I get about 10,000 spams per hour, which seems to be a limitation of the server. If the server was faster, then it would probably get more spam. With the filters on, I get about 1 message per hour, which is more acceptable. I don't like the idea of RBLs, but I see no other way to handle the problem - if you are a spammer, then I don't want to talk to you - ever. Stupid idiots. It is also interesting that all brute force attacks that I have observed start at 'a'. So the best passwords will start with 'z'.
  • I get more like 98% spam.

  • email spammers also probably parse out email addresses that dont start with alpha characters...
  • by aembleton (324527) <.aembleton. .at. .gmail.com.> on Sunday August 31 2008, @04:10PM (#24822811) Homepage
    From looking at that graph; it would be more interesting to see the signal to noise ratio for each of the letters and numbers. Those names beginning with an 'A' do indeed receive more spam, but also far more non-spam. In fact it looks to be more like 50:51 (non-spam : spam), whereas from first glance those email addresses beginning with a 'P' receive 40:60.
  • by Dynamoo (527749) on Sunday August 31 2008, @04:22PM (#24822917) Homepage
    Yes, the beginning of the alphabet gets more spam.. and it's really very simple to explain why.

    Spammers work from lists of email addresses, and those email addresses are typically sorted by domain and then alphabetically. So, the receiving domain gets a rush of emails for users with addresses beginning with A, B, C etc. But usually (at some point) many mail systems will detect that there is a spam attack in progress and they will block subsequent messages of the same format or from the attacking IP address (depending on the spam filtering setup in place).

    So, but simply the people beginning with "A" get nice new spam that the adaptive filters don't detect. By the time it gets to "Z" a good filter will automatically block the attack.

    What's sad is that I watch spam attacks often enough to know this.

  • I rarely get spammed, even on hotmail

  • With two z's.

    "But my name really is ZZwyggle!"

  • Oh really? (Score:3, Interesting)

    by Antony-Kyre (807195) on Sunday August 31 2008, @07:46PM (#24824703)

    I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)

    Now, let's see if that holds true.

    • I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)

      And posting it on the web is a great idea to avoid spam.

      • My post was suppose to be on the humour side. I am surprised it was moderated as interesting. More importantly, if people had some random characters in their e-mail addresses, there would be a less likely chance of getting hit by those using dictionary-based attacks.

        As of right now, no mail has came in.

  • Zebras already have big penises!

    • Brute force attacks also start at 'a' - always. I have never seen one that starts at the end of the aphabet. So make sure that your user name and password are from opposite ends of the spectrum.
    • Re: (Score:3, Insightful)

      Indeed, the PDF paper says this is measuring the rate of filtering AFTER using Spaumhaus black holes, and the measured rate is their custom "Cloudmark" spam detection tool. Importantly, if their tool sucks enough that people opt out of it entirely, all email is considered "not-spam". But as long as these effects are not influenced by the first letter, that's okay.

      Unfortunately, the paper tries very hard to present a very silly notion about 'a' versus 'z'. The important concept here isn't order, it's letter

    • My sentiments exactly. I have this username as an account name on another system, and I honestly get about one spam a week to that account. Maybe there's something to this z thing.