Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Spam

Zebras Get Less Spam Than Aardvarks 115

MojoKid writes "A recent study (PDF) by Richard Clayton at Cambridge University determined that the first letter of a someone's email address directly affects how much spam they receive. As shown in the graph at either link above, email addresses with numbers as their first characters receive even fewer spam emails. The corpus used in the study was 8 weeks' worth of email from the UK ISP Demon Internet, just over half a billion messages, of which 56% was deemed to be spam."
This discussion has been archived. No new comments can be posted.

Zebras Get Less Spam Than Aardvarks

Comments Filter:
  • by Shajenko42 ( 627901 ) on Sunday August 31, 2008 @03:20PM (#24821801)
    Spammers will now alter their programs to start with "z" and numbers, so they can get the people who aren't as desensitized by spam.
    • Re: (Score:3, Insightful)

      but, like the article says, there are fewer people whose e-mail addresses start with z or numbers. so they'd be getting fewer hits by targeting those starting characters. there's already more spam messages being targeted at "zebras" per legitimate target than there are spam messages being targeted at aardvark addresses.

      so the smart thing for spammers to do is to stop wasting time with zebra addresses, since they'd have a higher chance of actually reaching a real mailbox by targeting more popular character r

      • Re: (Score:3, Interesting)

        by arth1 ( 260657 )

        I think this might be a distant relative to Benford's law (the one that shows that about 30% of all counted numbers will start with the digit "1", not 10% as one might think).

        Going through some crack and john-the-ripper logs, I saw that there was a good correlation between the position in the alphabet not only for the passwords, but also for the user names.
        Based on pure letter frequency, you'd think that there would be a typical E-R-S-T-N ranking, but this doesn't appear to be the case for the initial lette

    • > Spammers will now alter their programs to start
      > with "z" and numbers, so they can get the
      > people who aren't as desensitized by spam.

      A or Z, just be glad your name isn't Lorenzo.

  • by AJWM ( 19027 )

    I guess that explains why I seem to get more spam than most.

    • Brute force attacks also start at 'a' - always. I have never seen one that starts at the end of the aphabet. So make sure that your user name and password are from opposite ends of the spectrum.
      • by HTH NE1 ( 675604 )

        Brute force attacks also start at 'a' - always. I have never seen one that starts at the end of the aphabet.

        I do all my brute force attacks with the California recall election alphabet: R, W, Q, O, J, M, V, A, H, B, S, G, Z, X, N, T, C, I, E, K, U, P, D, Y, F, L.

        "Now I know my R, W, Qs. Next time won't you sing with muse."

  • I can see why they'd start at the front of the alphabet, and why those folks would tend to get more spam.

    But wouldn't numbers sort even in front of the letter A?

    • Re: (Score:3, Interesting)

      by xaxa ( 988988 )

      I'd guess that addresses with numbers at the beginning are often invalid, so they don't bother with them. I get spam attempts addressed to message IDs, which are generally something like 238947529345user@example.com.

      • by stevey ( 64018 )

        I commented on this a few days ago - but I too see many many delivery attempts to email-like message IDs.

        I guess that means that somebody has a compromised machine which is being crawled, or perhaps a mailing list archive online has them visible.

        Highly irritating, but pretty easy to block.

        • Re: (Score:3, Insightful)

          by nabsltd ( 1313397 )

          I think most of the spam targeted at a message ID comes from crawling USENET.

          On my server, I see lots of e-mail with a "rcpt to:" that matches the regex "(mpg\.)?[a-f0-9]+\@news\.domain\.com". This is the format that inn uses to create message IDs.

          • by Dan541 ( 1032000 )

            I think most of the spam targeted at a message ID comes from crawling USENET.

            I've had hundreds of spam attempts at my message IDs from Usenet postings, I get more delivery attempts to message IDs than to my real email address I post with.

  • by Anonymous Coward

    We had an ex-city manager, who had a son named Zachary Z. Zoul.

  • by knappe duivel ( 914316 ) on Sunday August 31, 2008 @03:32PM (#24821917)
    Zebra's and aardvarks don't eat Spam. Or ham.
    • Are they Jewish?

      • Are they Jewish?

        The aardvark is. Where do you think the double "a" came from?

        (Disclaimer: I am Jewish)

        • by caluml ( 551744 )
          The first thing I thought of was that in German and Dutch, aard = earth.
          I looked it up, and yes, I'm right [etymonline.com]. Not sure if it was some amazingly funny ironic Jewish joke that I don't get, but hey, in case it's not, we've all been educated.
          • No, it was me posting at an hour that I should have been sleeping. The only other English word that I know that begins with two "a"'s is the name Aaron, which is a Hebrew name.

            The wife is right. I'm not funny.

  • What? (Score:5, Informative)

    by pablomme ( 1270790 ) on Sunday August 31, 2008 @03:34PM (#24821935)

    The conclusion is ridiculous. There's more spam for addresses starting with 'a' than with 'z' because there is more traffic to those addresses. See the the graph [hothardware.com]. The line in the graph is the only solid piece of information, and it is just a lot of noise around the mean value of 56%; if anything, it indicates the opposite conclusion.

    • Re:What? (Score:5, Insightful)

      by Oidhche ( 1244906 ) on Sunday August 31, 2008 @03:52PM (#24822081)
      Indeed. The conclusion that I'd draw from presented data is that there are more e-mail addresses beginning with 'a' than with 'z' (and that very few addresses begin with a number). Even the percentage of spam is nearly meaningless. To find anything about which addresses receive more spam, you should look at the average amount of spam per e-mail address in a given group, not the total number of messages.
      • I don't know, I haven't seemed to get too much spam... Oh wait, I have 3 e-mail addresses... let me check this one.... Yep 400 since last year.
    • Re: (Score:3, Informative)

      by 4thAce ( 456825 )

      According to the PDF, this graph is for all email addresses, not for 'real' addresses, which they define, more or less, as those addresses which receive at least one non-spam email every other day. Since they are looking only at Demon's logs, not the contents of actual mailboxes, they have to use this heuristic to filter out the bogus combinations that the spammers are trying.

      If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A

      • If they impose the condition that only 'real' addresses are considered, the graph changes to one with a higher percentage spam for A addresses than for Z addresses, as asserted in the summary.

        But probably not significantly higher (the difference being greater than the noise). Again, the data tells us nothing interesting.

    • by eggnet ( 75425 )

      You are right.

      Additionally, everyone knows that spammers don't sort by the entire e-mail address, they group by the target domain name anyway for one of two reasons:

      1. to efficiently send bulk e-mail to one server for a lot of users.

      2. when trying to be stealthy, to spread out the e-mail to one mail server over time.

      Sure, you could sort e-mail addresses to weed out duplicates but ultimately you have to group them by domain name.

    • by mysidia ( 191772 )

      There being less spam naturally implies there is less traffic. Most traffic is spam; so if there were no spam, there would be very little traffic.

      Legitimate messages VS spam messages are completely different and non-spam traffic really has nothing to do with the total spam volume, except as a comparison tool; the traffic numbers that actually have to do with the spam problem are the number of spam messages.

      It's spam volume that really matters. If I get 10 messages, and 9 of them are spam, i'm much hap

      • There being less spam naturally implies there is less traffic.
        Most traffic is spam; so if there were no spam, there would be very little traffic.

        Look at the graph. Spam accounts for around 60% of the traffic for all groups of email addresses, regardless of total number of messages. My point is, it's likely that there are far more addresses starting with 'a' than with 'z', and that's all the data tells us. There is no proof of correlation between starting letter of the address and amount of spam.

      • by repvik ( 96666 )

        Sign up to the LKML, and presto! Most of your mailbox is now non-spam ;)

      • It's spam volume that really matters. If I get 10 messages, and 9 of them are spam, i'm much happier than if I get 1000 messages and 500 of them are spam.

        On the other hand, if I got 1000 messages, I'd be much happier if 999 of them were spam than if only 500 were spam.

        • by mysidia ( 191772 )

          I think that would suck.. I wouldn't be able to tell for sure that only one (unimportant) message was legit ham without examining and discarding each of the 999 spam messages.

    • No, your interpretation of the graph is ridiculous. There is basically no noise at all. The sample size for "z" is about 5 million emails. The results are highly significant!
      • There is basically no noise at all. The sample size for "z" is about 5 million emails.

        So you think that 5 million is, magically, a "safe" sample size. For anything, using any method of measurement with or without deficiencies. Yeah, sure, why not.

        Even if it was, the total volume of spam is not a measure of anything significant. The volume of spam relative to the total volume is, which the line represents. And it shows no decrease as you move to the right, contrary to what they claim.

  • by Teun ( 17872 ) on Sunday August 31, 2008 @03:36PM (#24821953)
    56% percent deemed spam?

    I thought most in the know see a far higher percentage, my ISP records over 95%:

    Xs4all statistics [xs4all.nl]

    Makes me wonder about the rest.

  • by paulatz ( 744216 ) on Sunday August 31, 2008 @03:54PM (#24822095)

    I know nobody actually bothered to read it, but from the graph it looks like there are much more email addresses starting with an "a" than with a "z". The former get about as much spam as legit emails, while the latter get about 2 or 3 times more spam than legit emails.

    • Re: (Score:1, Informative)

      by Anonymous Coward

      This is correct. And according to the Article, the Wallaby gets the lowest percentage of spam while the Zebra gets about the highest percentage of spam.

  • by flyingfsck ( 986395 ) on Sunday August 31, 2008 @04:07PM (#24822201)
    and yes they get tons of spam, about 99.999% of connection attempts are spam, but a couple of RBLs and Spam Assassin takes care of it. If I turn the protection off, then I get about 10,000 spams per hour, which seems to be a limitation of the server. If the server was faster, then it would probably get more spam. With the filters on, I get about 1 message per hour, which is more acceptable. I don't like the idea of RBLs, but I see no other way to handle the problem - if you are a spammer, then I don't want to talk to you - ever. Stupid idiots. It is also interesting that all brute force attacks that I have observed start at 'a'. So the best passwords will start with 'z'.
  • I get more like 98% spam.

  • email spammers also probably parse out email addresses that dont start with alpha characters...
  • by aembleton ( 324527 ) <aembleton@gmaiRASPl.com minus berry> on Sunday August 31, 2008 @05:10PM (#24822811) Homepage
    From looking at that graph; it would be more interesting to see the signal to noise ratio for each of the letters and numbers. Those names beginning with an 'A' do indeed receive more spam, but also far more non-spam. In fact it looks to be more like 50:51 (non-spam : spam), whereas from first glance those email addresses beginning with a 'P' receive 40:60.
  • by Dynamoo ( 527749 ) on Sunday August 31, 2008 @05:22PM (#24822917) Homepage
    Yes, the beginning of the alphabet gets more spam.. and it's really very simple to explain why.

    Spammers work from lists of email addresses, and those email addresses are typically sorted by domain and then alphabetically. So, the receiving domain gets a rush of emails for users with addresses beginning with A, B, C etc. But usually (at some point) many mail systems will detect that there is a spam attack in progress and they will block subsequent messages of the same format or from the attacking IP address (depending on the spam filtering setup in place).

    So, but simply the people beginning with "A" get nice new spam that the adaptive filters don't detect. By the time it gets to "Z" a good filter will automatically block the attack.

    What's sad is that I watch spam attacks often enough to know this.

  • Am I the only one who actually paid attention to the graph? Yes, sure, the "overall" spam amounts for A is much bigger than for Z, but that is because there are more email addresses that start with the letter A than those starting with the letter Z. If not, check how spam/non-spam percentages all stay withing the 50%-75% interval.

    In other words: more email messages = more spam, "OH GOD!!11ONE let's write a paper!"

    Slow news day?

  • I rarely get spammed, even on hotmail

  • My servers get at least 95% identified (and blocked) spam vs. ham across ~200 domains that have different characteristics. And i know that some of them slip through the filters also.

  • Spam? wot that?
    • by zaren ( 204877 )

      My sentiments exactly. I have this username as an account name on another system, and I honestly get about one spam a week to that account. Maybe there's something to this z thing.

  • With two z's.

    "But my name really is ZZwyggle!"

  • Oh really? (Score:3, Interesting)

    by Antony-Kyre ( 807195 ) on Sunday August 31, 2008 @08:46PM (#24824703)

    I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)

    Now, let's see if that holds true.

    • I just created the e-mail address zh80lukgwggok4kko0kcbrhjm@hotmail.com (yes, seriously)

      And posting it on the web is a great idea to avoid spam.

      • My post was suppose to be on the humour side. I am surprised it was moderated as interesting. More importantly, if people had some random characters in their e-mail addresses, there would be a less likely chance of getting hit by those using dictionary-based attacks.

        As of right now, no mail has came in.

        • by pbhj ( 607776 )

          Me, on bad phone line: My email address is z;45-r_0oisdgf*yihh@hotmail.com
          Person: Wtf??? Can I read that back.
          Person: zcolonfortea5 ...
          Me, interrupting: No, hang on it's z ;, as in the symbol, 4 5 - r _ ...
          Person, repeating as we go: z ; 4 5 dash r underscore ....

          Result — I never get ham because noone can ever transcribe my address.

      • That's the whole point: he needs it to be harvested.

  • Zebras already have big penises!

  • No spammer has ever bothered me!
    I worship the letter "A" and sneer at these pantywaist spammers!

    Aardvarks will NEVER give in to threats of spam!
    Nor we aardwolves!
    .
  • The reason zebras get less spam is because the spammers know that they are already hung like a horse!
  • So that explains it!
    - arbitrary aardvark.

    +3 funny, sad.

  • ...who thinks this is nonsense? The spam/ham ratio isn't really conclusive, and that's what matters, right? Thanks, Chaz

For God's sake, stop researching for a while and begin to think!

Working...