Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Spam

SpamArchive.org Launched 269

An anonymous reader writes "SpamArchive.org has just been launched. SpamArchive.org is a community resource that provides a database of known spam to be used for testing, developing, and benchmarking anti-spam tools. The goal of this project is to provide a large repository of spam that can be used by researchers and tool developers. In the past, there were a few small personal spam archives that were used. There was no large set of spam that could be used to test new anti-spam algorithms. Thus, developers could not sufficiently test their techniques across a range of messages. Also, the lack of a "standard" sample of spam made it difficult to effectively benchmark anti-spam tools."
This discussion has been archived. No new comments can be posted.

SpamArchive.org Launched

Comments Filter:
  • So... (Score:5, Funny)

    by Markus Landgren ( 50350 ) on Thursday November 21, 2002 @05:02AM (#4721533) Homepage
    Do they have a mailing list I can sign up for if I want to get updated by e-mail?

    • by some guy I know ( 229718 ) on Thursday November 21, 2002 @05:09AM (#4721553) Homepage
      I think that they should send email out to everybody describing this great service!
      • Do we know that this is a good site, or is this a devious mechanism to collect the email addresses of everyone who forwards them spam?
    • Re:So... (Score:5, Insightful)

      by RyoSaeba ( 627522 ) on Thursday November 21, 2002 @05:11AM (#4721561) Journal
      LOL, want'em to forward every new spam they receive ?
      Don't you have enough already ? ^_^

      Seriously, this sounds like a great idea.

      I can see a few technical troubles to catalog spam, though.
      Most obvious is that usually spam is personalized, that is the recipient's mail address (or part of it) often appears either in the subject or in the body. So will this archive store every variant of every spam, or just a 'global' model ?
      Also need to define how catalog tools are supposed to access the archive, ie: grab from url ? ftp text file ?

      And in any case, until spam filters are hooked directly on the smtp mail server itself, users will still have to take the time to configure their anti-spam tool, launch it regularly to clean the mailbox, and so on...

      For instance Mozilla will incorpore spam filters, but from what i got you'll still have to download that freaking spam before it gets filtered, which can take some time if those are big spams (like viruses or such).

      Ok, it sure beats having legitimate mails removed from the server without our knowledge...

      Just my 2 cents of euro.
      • Most obvious is that usually spam is personalized, that is the recipient's mail address (or part of it) often appears either in the subject or in the body. So will this archive store every variant of every spam, or just a 'global' model ?

        I guess this could be easy to implement some "almost identical" recognition filter but the problem would be that somebody forwarding a funny spam to somebody else (hey, haven't you kept your very first "herbal alternative to viagra spam" spam message in order to show it to somebody ? ... ok, neither did I.) might be listed as a spammer so, there should be some re-occurrence filter to ensure that a given "spammer" doesn't send a given spam-model more than once to more than once recipients but here, once again, we may face some situation where everybody could be hurt by such restrictions.
        I personally consider the spam problem as overhyped as it doesn't take me more than 15 seconds a day to eliminate unwanted messages.
        I have more problem in real life with these advertisers who dump their pizza-prices in my mailbox but here, in Switzerland, every one pay for every garbage he dumps.
      • Not intended purpose (Score:4, Informative)

        by 0x0d0a ( 568518 ) on Thursday November 21, 2002 @05:58AM (#4721682) Journal
        This isn't like Distributed Checksum Clearinghouse or some other spam *solution*. It's intended to test to see what percentage right antispam tools get right -- false positives and negatives. It's useless (at least directly) to end users.

        So unless your antispam tool breaks on some names in personalized letters, I would think that it's okay.
    • Re:So... (Score:4, Funny)

      by stevenp ( 610846 ) on Thursday November 21, 2002 @05:38AM (#4721630)
      > Do they have a mailing list I can sign up for if I want to get updated by e-mail?

      No, but you can open a Hotmail account and receive a dayli dose of UP-TO-DATE spam message FOR FREE.
    • Re:So... (Score:3, Informative)

      by Arker ( 91948 )

      If you want to get a lot of spam to test your filters with, just check the archives [google.com] of NANAS [admin.net-....sightings] on Usenet. What precisely this new thing does that a spider of that archive couldn't give you I don't know.

  • wow (Score:2, Funny)

    by gomerbud ( 117904 )
    I should just gzip my mbox and send it to them. That'll give them years of research material.
  • by Anonymous Coward on Thursday November 21, 2002 @05:03AM (#4721536)
    There was no large set of spam that could be used to test new anti-spam algorithms


    Whoever wrote this obviously doesn't have a Hotmail account.
  • by RebRachman ( 144344 ) <rebecca.ganglysister@com> on Thursday November 21, 2002 @05:03AM (#4721537) Homepage
    Even I know how to buy a domain name and write a few paragraphs of text on a white background. There is nothing about this archive to hint at its origin or credibility. This is a /. worthy story?
    • by arvindn ( 542080 ) on Thursday November 21, 2002 @05:21AM (#4721588) Homepage Journal

      Even I know how to buy a domain name and write a few paragraphs of text on a white background.
      But you didn't, did you?

      This is a /. worthy story?
      You're missing the point. The story is not on /. because something revolutionary has been done, but because the huge number of /. readers can get together and create a useful database. Obviously it would be no good if no one knew about it. In a sense, the story is worthy because it got on /. :) Kind of a reverse Catch-22, if you like.

      What you can do:
      • Help them implement their automated spam review scripts. As with any project, they need volunteers.
      • Make sure you send them a copy of all the spam you receive. From their page:
        SpamArchive.org's efficiency is proportional to the amount, quality, and variety of spam that is provided. End users can forward known spam to submit@spamarchive.org.
      • by RebRachman ( 144344 ) <rebecca.ganglysister@com> on Thursday November 21, 2002 @06:08AM (#4721706) Homepage
        The point is that if they want to do a spam archive, you would expect them to do some minimal research. This page clearly shows that SpamArchive.org has not done the following basic background work:

        1. Told me who they are so that I might trust them.

        2. Told me anything about their technology/database so that I might know if it is really going to be useful. For all I know they haven't even thought about the collection, storage and retrival issues behind dealing with this.
        3. Collected the archives supposedly uncoordinated that already exist and collated them.
        4. Added even one link to a relevant site. You would assume that to undertake such a project they would at least have visited a few sites before concluding there was nothing out there. Posting couple of relevant URLs wouldn't be too much work.

        In short, I am not impressed that someone who can do 20 minutes of work is the same someone who can undertake the huge project proposed here. It looks like they think that somehow all they need is for people to send them information by e-mail, and for a few other people to volunteer to do the work. Not a promising start.
      • by Maddog Batty ( 112434 ) on Thursday November 21, 2002 @06:14AM (#4721718) Homepage
        If you were a spammer and wanted to collect a large number of valid email addresses, how about this as an idea...

        1) Produce a website pretending to be antispam.

        2) Ask people to send their spam emails to the site (generally including a valid from address of course)

        3) Publish on slashdot so as to get lots of interest.

        4) ???

        5) Profit!

        (Unfortunately, we all know what stage 4 is for spammers...)
    • Whois.. (Score:5, Informative)

      by Anonymous Coward on Thursday November 21, 2002 @05:26AM (#4721598)
      says:
      Domain Name: SPAMARCHIVE.ORG
      Owner, Administrative Contact, Technical Contact, Billing Contact:
      Guru Rajan (ID00024772)
      11475 Great Oak Way
      Suite 210
      Alpharetta, GA 30022
      us
      Phone: +1.6789699399
      Email: guru.rajan@ciphertrust.com

      http://www.ciphertrust.com introduces itself as:

      Protect Your Email Gateway
      Anti-spam and email security for the enterprise

      CipherTrust has integrated defenses for all email application-level threats into one, comprehensive device. Our IronMail appliance protects enterprise email systems such as Microsoft Exchange, Lotus Notes and Novell GroupWise against viruses, spam, and intruders, and provides message privacy and policy enforcement.
      • Re:Whois.. (Score:4, Insightful)

        by Anonymous Coward on Thursday November 21, 2002 @06:54AM (#4721786)
        So let's get this straight...

        This database is run by a little-known company of [google.com]
        mixed reputation that sells its own anti-spam tool.

        It doesn't promise any new functionality that news.admin.net-abuse.* doesn't already provide. There's absolutely no reason to believe that the spams collected here will be any 'better' a sample than those collected by opening a random Hotmail account.

        So, what's in it for Ciphertrust? As well as their own library of spam, they'll have a collection of e-mail addresses of people who are interested in fighting spam.

        And what's in it for us? Anyone? Bueller? Anyone?
  • Database? (Score:5, Funny)

    by dat00ket ( 249468 ) on Thursday November 21, 2002 @05:03AM (#4721538) Homepage
    Can't researchers just set up their own hotmail account?

    Seems cheaper.
    • by JessLeah ( 625838 )
      Ahh, but think of the fees they'd have to pay Microsoft for all that extra storage ;)

      After they carefully posted the new Hotmail address all over the Web, they'd blow their quota in around 12 hours. :)
    • Re:Database? (Score:2, Informative)

      by stevenp ( 610846 )
      The learning mechanisms for detecting spam, like the Bayesian classification [paulgraham.com] require a large amount of messages to build a good spam detection profile. The average 500 message JunkMail folder is not big enough for the purpose.
      • The average 500 message JunkMail folder is not big enough for the purpose.

        What? If a Bayesian script was having to go through significantly more than that per e-mail to check whether it was spam, you'd be waiting minutes just to get your e-mail classified.
  • by JessLeah ( 625838 ) on Thursday November 21, 2002 @05:04AM (#4721542)
    ...where wizened historians wearing horn-rimmed spectacles will sit, hunched over computers, studying the archives of ancient spam.

    "This one mentions sex... apparently, sex was a preoccupation of the early twenty-first century..."
  • archive overload (Score:2, Interesting)

    by ndevice ( 304743 )
    Asking for a slashdotting is one thing, but asking to be an archive for spam is another.

    I wonder if anyone knows just how much of the stuff is out there, and if it's even possible to store all that. Of course, spam being mostly duplicates and all, maybe they have a chance. But with spammers staying ahead of the game and rotationg their text, I wouldn't count on it.

    On the other hand, why not just set up a couple of hotmail accounts, bait them a bit, and just watch the spam come in? Why even bother asking for it?
  • Trade Spam! (Score:5, Funny)

    by Pathwalker ( 103 ) <hotgrits@yourpants.net> on Thursday November 21, 2002 @05:07AM (#4721548) Homepage Journal
    Now that spam is so collectable, someone should start a service to let people trade it?

    What will someone give me for my rare "Help fund the freedom fighters in Chechnya!" complete with numbered bank accounts to send donations to?
    • by Surak ( 18578 )
      Now that spam is so collectable, someone should start a service to let people trade it?

      Yeah, it's called 'Gnutella'. :-P

  • Who are these guys? (Score:5, Interesting)

    by gomerbud ( 117904 ) on Thursday November 21, 2002 @05:09AM (#4721555) Homepage
    Dude, i could have registered a simlar domain and put up a comparable web page within a matter of hours. I hope they really exist.

    Wouldnt it be great if the submit email address was forwarded to someone's ex girlfriend? Thats the ultimate form of revenge...

    1) Register domain name.
    2) Put up web page advertising some kind of anti-spam database.
    3) Forward all email sent to the submit address to someone you dont like.
    4) Get slashdotted.

    The end result is that three million people send 100 spams the first hour to the submit address. Within a short amount of time, your foe has 300 million emails in his/her mailbox. Now that's spam.
    • Much easier:
      • Set up sendmail
      • Make script that sends a mail out of a random collection of SPAM, goatse.cx pictures and viruses. Make sure that the FROM: fields is faked
      • For the paranoid: use free dial-up ISP in order to cover your traces.
      • Set script in cronjob and let it run every minute. (or run put the script in infinite loop)

      Your ex is gonna love you for that. Not that *I* ever do such things... Don't be astonished if your car is keyed the next day, by the way.

  • by phunhippy ( 86447 ) <zavoid@[ ]il.com ['gma' in gap]> on Thursday November 21, 2002 @05:11AM (#4721559) Journal
    Damn!
    And there I was thinking they were creating a historical archive of all the funny worthless spam we get in our mailboxes every day...

    See that could turn spam in to a fun thing! set up a site where spam is ranked most popular by the number of people forwarding in the same SPAMS they get.. i think it would be interesting to see a daily/hourly/weekly TOP 10 SPAM in the world graphs..

    I would do this myself.. cept i suck at html.. anyone need a VoIP network built? :)
  • recycled spam (Score:2, Insightful)

    by ndevice ( 304743 )
    With some people already accusing bugtraq of being a repository for exploits that anyone could use for exploit purposes, you'd think that the same could happen to the spam archive.

    Soon we'll see old spam being recycled as the new breed of spam trolls mine the archive for inspiration - and maybe just material reuse.

    Then, of course, it's not like we don't see recycled spam anyway, so maybe this isn't such a bad thing...

    (And if I sound incoherant, it's 2 in the morning. I should be sleeping.)
  • by EmagGeek ( 574360 ) on Thursday November 21, 2002 @05:12AM (#4721565) Journal
    Is this really necessary? I mean, come on, how hard is it to find spam for research? Most people get more spam than their Hotmail inbox can handle just for signing up for the account. All a researcher has to do is start clicking the "Remove Me" link in those emails and he or she will have more spam than he or she knows what to do with!

    Combine that with posting to some anti-spam newsgroups with their real email address, and bingo boingo, all the spam in the world will come right to them.

    This site also creates a problem in that only the spam posted to that site might be used for research. There might be millions of spam emails overlooked because they don't make it onto that site. Think of those poor spammers that won't get filtered :)

    Won't someone please think of the children!?!?

    • Is this really necessary? I mean, come on, how hard is it to find spam for research? Most people get more spam than their Hotmail inbox can handle just for signing up for the account. All a researcher has to do is start clicking the "Remove Me" link in those emails and he or she will have more spam than he or she knows what to do with!

      Wrong. I've been setting up bogus e-mail accounts on a domain created exlusively for spam research/testing. I've gone through at least a dozen "unsubscribe" links and never received one spam out of it to those test accounts. Perhaps the spammers only highlight records for people who "unsubscribe" when those people were in their database in the first place.

      (The most spam I've received so far in one of these test accounts was from signing up to freefootfetishezine.com.)

      This site also creates a problem in that only the spam posted to that site might be used for research. There might be millions of spam emails overlooked because they don't make it onto that site. Think of those poor spammers that won't get filtered :)

      That doesn't make sense; they might not get a good sample of the spam if they don't solicit samples, just as much as they might not get a good sample if they do. It makes more sense that they would get more spam--and more diverse spam--from soliciting examples. Consider that submitted samples would come from all over the world, from a variety of sources, and in a variety of languages.

  • Well, now that all the possible spam is archived in one place, we can expect spammers to find out new methods of spamming, which are not in the archive. The people who are behind this, (no names, no addresses mentioned in the site) would do well instead to archive the latest developments in anti-spamming technologies, than just archive the spam. Also, IMO, a tool that is tested with such a big archive of general spam, will never work for specific anti-spamming applications, which is what consumers would prefer.

  • What about NANAS? (Score:5, Informative)

    by tsvk ( 624784 ) on Thursday November 21, 2002 @05:17AM (#4721579)

    NANAS, or the newsgoup news.admin.net-abuse.sightings does just this. It is a public archive of spam which can be searched e.g. with Google Groups:

    http://groups.google.com/groups?group=news.admin.n et-abuse.sightings [google.com]

    Why reinvent the wheel? Or does this new spam archive have any new functionality to offer?

  • Now Spam Radio [spamradio.com] got an archive to dig out new infomercials from. :)
  • NANAS Google Archive (Score:5, Informative)

    by Ricardo Dias Marques ( 200514 ) on Thursday November 21, 2002 @05:21AM (#4721589)

    Well, there is already a pretty large Email and USENET Spam archive at the NANAS (news.admin.net-abuse.sightings) newsgroup.

    You can check the Google Groups archive [google.com]

    You can read the NANAS charter at http://www.killfile.org/~tskirvin/nana/charter/nan as.html [killfile.org]

  • spamarchive.com (Score:3, Informative)

    by philj ( 13777 ) on Thursday November 21, 2002 @05:26AM (#4721597)
    I've owned spamarchive.com for ages.

    Want it? - I have no use for it.....
  • The opposite (Score:5, Insightful)

    by sholden ( 12227 ) on Thursday November 21, 2002 @05:29AM (#4721604) Homepage
    Exactly the opposite is needed for work on mail filters.

    Spam is really easy to find, everyone knows that, create a hotmail account fill out some web forms, post to some newsgroups, put a mailto: on a web page. Wait a little while. Bingo, lots of spam.

    However, non-spam email is harder to find. Using your own makes techniques that work with your particular type of email and not other people's.

    Non-spam is harder to collect. Since email is often private in nature. Removing identifiers from the headers is easy enough, but the body also can contain things like addresses, emails, phone numbers, comparisons of the boss to bacteria, etc.

    A collection of real emails, from which personal information has been replaced with fake data would be of great use. A few people I know are working on creating such a data set of email. It is aimed at more general email filtering though, not just spam detection, and hence requires categorisation. And is from academia and hence will probably lose the race with the heat death of universe for completion.

    I do note they have a 'non-spam' heading on the very sparse web page which is encouraging.

  • by zedman ( 98578 ) on Thursday November 21, 2002 @05:30AM (#4721607) Homepage
    Would spammers try to "anti-spam" the spam archive by submitting billions of perfectly normal emails?

    Ian
    • And what about the users that were lazy [ezine-tips.com] and didn't want to unsubscribe from a mailing list (let's say, e-bay) and just block it as being "spam"). This comes back as what exactly is spam?

      -- This posting is ACCORDANCE with slasdot law 2.8.
    • Spammers are generally just stupid enough to click send. They won't likely find this site, and it's not worth their time to mess it up either.
      • But are normal people smart enough for their own good?
        I'm already contemplating to submit submit@spamarchive.org to "daily-word-of-the-bible mailinglists"
  • by Anonymous Coward
    This worthy effort needs funding to keep it alive. I have some contacts from Nigeria who may be able to help, I will forward their details.
  • by jjl ( 514061 ) on Thursday November 21, 2002 @05:34AM (#4721617) Homepage
    Archive of samples of non-spam messages should be collected as well, containing real E-mail messages which aren't spam. These messages should be more or less normal private E-mails which are just volunteered to make public for testing purposes.
    The purpose of the samples of non-spam messages would be to help preventing false hit testing for the spam filtering algorithms, just as real spam messages are used to tune the algos for detecting spam.
  • Like SpamHaus [spamhaus.org] ? It seems like a similar service right ?!

  • Take the test [hatchoo.com] and find out... ;)
  • What if... (Score:5, Interesting)

    by serlaten ( 619839 ) on Thursday November 21, 2002 @05:41AM (#4721636)

    ...spammers use the anti-spam tools to create spam that doesn't trigger the automatic spam filters.

    1. Write spam mail
    2. Filter through widely used spam filter
    3. If spam is flagged as spam, rewrite; goto 2
    4. Send
    5. Profit
    • Re:What if... (Score:3, Insightful)

      After a certain point though, spammers are pretty much stuck with a few basic "selling points" -- it's hard to sell something if you don't include a product description or URL or address/phone of some sort, and spam filters will evolve to catch those kinds of things unless they're stripped down to their bare bones (as in, just a random bare URL.... hey, wait, that sounds like half the e-mail I send to my friends ;).

      Even then, a hypothetical "widely used" spam filter will probably include a user-specific Bayesian filter, so you can create your own local database of what tends to be spam, and more importantly, what tends not to be spam -- and your own "real mail" keywords will probably be highly specific to your interests/career. So you're basically "evolving" a personal blacklist/whitelist to go along with the global filter.

      But probably the most interesting thing about "spam evolution" is that if spam can get through a spam filter, it's going to be really toned-down and bland. That may not make a difference to you, but it'll drastically lower the spammers' response rates because their ads aren't as flashy. Less profit = less spammers. (This last paragraph wasn't "my idea" -- forget where on the web I saw it.)

    • by Pendant ( 155221 ) on Thursday November 21, 2002 @07:09AM (#4721816) Homepage
      In order to counter the rising tide of spam I recently installed a spamblocker, even though I'm wary of such beasts because of the danger of false positives.

      Sure enough, I have received false positives. But only from one source: my filter traps the Network Solutions email asking for confirmation to proceed with the transfer away of a domain to another registrar. Net$ol changed the format of these emails a while back: they now start off by talking about a "special offer" and it's only towards the end that the real purpose of the message is revealed. My suspicious mind wonders whether these emails are intentionally designed to look like spam to reduce the number of successful transfers... sneaky :(
  • by heytal ( 173090 ) <hetal.rachNO@SPAMgmail.com> on Thursday November 21, 2002 @05:41AM (#4721637) Homepage
    The archive could give them a lot of valid email addresses...

    Consider this one: You forward a spam to submit@spamarchive.org. The forwarded mail is now a part of the archive. Spammers snoop the archive for email addresses.
  • I can send them a copy of all the awesome, truly fantastic offers that arrive in my mailbox? =)

    Oh, the joy! 300 copies of "make money fa$t", "enlarge the size of your penis" and "Amazing investment opportunities", delivered lovingly every day to this archive, to be preserved for the good of humanity forever more!

    (Clicking hysterically on the "forward" button...) ;)
  • by minesweeper ( 580162 ) on Thursday November 21, 2002 @05:46AM (#4721646) Homepage
    If you're looking for 5+ years of archived spam and plots of spam volume versus time, check out this guy's site [xtdnet.nl].

    His page of graphs [xtdnet.nl] shows the exponential growth of spam over the past few years.

  • Just think, instead of sending you yet another suggestion to partake of the latest penis enlargement scheme, they could just send you a URL pointing to the appropriate message in the archive. I'm sure many recipients would be a lot happier if they received a URL rather than a 1K message. Microsoft's Outlook would be nice and friendly too and probably display it without prompting.

    Of course, it would make filtering easier too.....

  • Good idea (Score:3, Interesting)

    by arvindn ( 542080 ) on Thursday November 21, 2002 @05:49AM (#4721657) Homepage Journal

    Aside from all the bashing these guys are getting here for not having any working code, this kind of database would actually be quite a good idea.

    One main problem for anti-spam is this: humans are very good at telling spam from legitimate messages. Comupters are nowhere close. Why not? Well, humans are simply better at certain types of problems like pattern recognition because of centuries of evolution. But there are ways around this: genetic algorithms and neural nets are two that I can think of. Both of these are "learning" strategies and need large databases to get started. We're talking about billions of messages or more, not the hundreds that you get everyday.
    So the kind of database (one for spam, one for non-spam) that these guys are talking about would be an excellent way to develop intelligent spam-detectors.

    Sorry if this is unpopular opinion, but we are against legal and in favor of technolgical solutions for most of the problems of the internet, aren't we? Then why are we waiting for anti-spam legislation to fall like manna from the sky? The best way to fight spam is using technology. Methinks this is a step in the right direction. So get off your ass and contribute. Forward your spam to them. Think of clever algorithms that can make good use of a large database. And code them. And submit patches. Isn't that what open source is for? Hey, may be this is going to be a killer app for open source, considering how big a problem spam is going to be in the next few years :)
  • Geekiness (Score:2, Funny)

    by EuroChild ( 523969 )
    "... a few small personal spam archives that were used..."

    Geekiness has reached a new high! Or should that be low...?

  • by gwappo ( 612511 ) on Thursday November 21, 2002 @05:54AM (#4721671)
    It would seem to me that the value of such a repository is limited if all it contains is spam.

    If anyone writes an anti-spam tool, I need to distinguish between spam and non-spam, making non-spam equally valuable for spam-filter benchmarking.

    Having a log with only spam makes it quite easy to achieve a 100% benchmark (simply reject it all!).

    Couldn't find anything about this on the site, so unless I'm missing something, the value of such a log is limited at best.

  • by ch-chuck ( 9622 )
    You might as well start up a database to catalogue all the different shapes of sand on the seashore - largely useless exercise in futility.

    What people are starting to do is block EVERYTHING that isn't on a 'whitelist'. That way granny and Junior don't get mail from anyone unless they're pre-approved. If they get mail from J.Random Stranger it's bounced with a request to put a short random token in the subject line. Thanks to marketing a good third of Internet mail traffic is useless crap. Thanks marketers!

    To show just how evil and desperate unemployed, cash strapped, deep in debt spawns of satan those people are - yesterday I got a letter from my mortage holder, Chase Manhattan bank, marked "IMPORTANT ACCOUNT DOCUMENTS ENCLOSED". It turned out to be yet another credit card pitch. ("You qualify to give us even more money!!") Bastards. It's not my fault the Msft office automation vision they bought into turned out to be way more expensive than the sales flak led them to believe.

    I wish unemployed marketers would turn to prostitution and drugs instead of spam - at least they'd be supplying things people actually WANT.

  • by imag0 ( 605684 ) on Thursday November 21, 2002 @06:08AM (#4721705) Homepage
    It's a trap!!!

    1) Set up story about new site accepting spam to assist in creating better anti-spam tools.
    2) accept all the submissions from the teeming millions(tm) at a popular tech site or two.
    3) cull all the email addresses from those duped to forward spam to you.
    4) sell said email addresses to spammers.
    5) PROFIT!!!!
  • by autopr0n ( 534291 ) on Thursday November 21, 2002 @06:11AM (#4721711) Homepage Journal
    Call me a cynic, but in my estimation, the only thing effective Spam filters based on content are going to do is make Spam more annoying. Why? Because spammers are going to have the same access to filters that regular people do. All they'll need to do is run their Spam through the filters to check and make sure they pass. In other words, if these Spam filters really work well then it won't be possible to determine what is and isn't Spam by a quick glance at the subject line or formatting of the message. Rather then "INCREDIBLE OPPORTUNITY FOR FAST EAZY MONEY$$$$$$$$$5390ANFP9O" and "HOT HORNY SLUTS WANT TO MEAT YOU" we'll get stuff like "Dude, check this out!" with a body like "hey man, long time no see. What have you been up to? I've just been hanging out, not too exciting, although I met this cool chick off the 'net. Hrm, you still looking for a gf? You should check out FriendFinder.com [friendfinder.com] :). Anyway, talk to you later, bro."

    And you'll need to read the whole message before you realize its Spam

    You might not like to believe it, but spammers (or at least some spammers) are hackers, in both senses of the word. ESRs supposed "hacker ethics" are as much bullshit as anything else he says.

    The only way these things will work is if the vast majority of people do not use these things. I don't know how likely that will be, with MSN already promoting it's 'less Spam' features.

    I think what we need is a fundamental change in the way email is handled. The current system is just way to prone to abuse, and should be replaced entirely. The new standard could use things like digital certificates and other technology to make sure you're talking to an individual (while protecting anonymity in some cases, although the receipt of anon email could be optional, etc, etc)
  • Think about it: while 99.999...n...9% of spam mails are either deleted before they're read or shunted into a "Spam" folder, there will be enough Internet newbies / technology imbeciles / other non-slashdotters ;=) who think that unsolicited emails can be a cure to their debt problems / small penis / whatever.

    So long as enough people are suckered by the adverts, the spammers get enough to pay their bandwith bills, and they can continue to spam us.

    What's needed is education for the naive: just ignore unsolicited adverts. TOTALLY. I mean, when was the last time you opened a credit card mailshot? Or one of those "Espescially for you" things in real life?

    Exactly. Trial by error is not a good learning solution for spam. It should be mandatory that all ISP sign-up procedures inform new customers that any unsolicited emails can safely be ignored, hopefully that way the spam industry will start to wither and die.

    -Mark
  • Is it me or (Score:2, Insightful)

    by zBoD ( 86938 )
    it is exactly the same thing as www.spamrecycle.com [spamrecycle.com] that exists for a long time now?

    BoD
  • What's the point? (Score:5, Insightful)

    by brunnock ( 18853 ) on Thursday November 21, 2002 @06:21AM (#4721726) Homepage
    What's the point of testing a filter against a database of known spam if you can't test it against a database of nonspam?

    Anybody can write a filter for bulk mail. How do you differentiate between solicited and unsolicited bulk mail?
  • I discussed this idea yesterday with my manager. I've been looking at spamhaus [spamhaus.org] over the last couple of days but they don't take spam reports from end users. So I had the idea of setting up a domain for users to forward spam. This spam database could then be used to create an RBL for the most active mail relays. I suppose now I can create the RBL without collecting the spam. :-)
  • How to end spam (Score:5, Interesting)

    by Permission Denied ( 551645 ) on Thursday November 21, 2002 @06:39AM (#4721765) Journal
    I've had the same email address for five years, and I receive zero spam. None whatsoever. I also advertise the email address widely (web, usenet, mailing lists).

    How does this work, you ask? I create a new email address each time I give out my email address. We have a sendmail setup that allows you to make "username+foo@example.com" go to "username@example.com" where "foo" is any arbitrary string.

    So, amazon.com thinks I'm "username+amazon@example.com", securityfocus thinks I'm "username+bugtraq@example.com" and so on. Once I receive spam on one of the addresses, it's trivial to write a filter that matches with near 100% confidence ("username+bugtraq@example.com" should only receive messages originating from securityfocus, etc.). Most times, if an address receives a spam, I can just procmail all mail to the address to /dev/null (eg, no complex rules like for the bugtraq example). This also allows me to track where spammers get their lists.

    We use sendmail. Equivalently, qmail allows "username-foo@example.com" and if you own your own domain, just use "foo@example.com".

    I find this advanced filtering stuff fascinating, from a completely academic point of view. I, of course, can't apply any of it since I don't receive any spam, but it's interesting nonetheless. I just read through how the Bayesian filter works. It is very simple: it only filters based on word (token) probabilities. So, it would assign a value to "make," "money" and "fast," but not "make money fast". Seems like you could get much better results if you do something more advanced like Markov chains or a neural net. There's lots of research out there on textual matching, and I'm not sure why people would start out with such a simple algorithm when there may be better things available (where "better" is measured not only by accuracy, but also by training time).

    • Re:How to end spam (Score:3, Insightful)

      by elodan ( 601886 )
      IMO, all the spam filtering technology we're so busy inventing is missing the point to an extent. It's not so much the problem of finding the spam in your mailbox and having to delete it, as it is to do with the amount of bandwidth downloading the spam eats up.

      You and I resent the time we spend deleting rude/crude/criminal/porno spam, but at the end of the day if you've got broadband you only notice the TIME lost.

      A user using a cheap Linux handheld in India can't afford the bandwidth to download a hundred graphic-rich spams a day.

      Bandwidth costs.

      Shouldn't we therefore be looking at ways to stop the spam being sent, or at least limit the propagation of it by filtering it early in the routing process?
      Unfortunately I'd guess this messing with other people's email would have legal implications, but can we work round it?

    • Re:How to end spam (Score:4, Insightful)

      by CvD ( 94050 ) on Thursday November 21, 2002 @07:32AM (#4721885) Homepage Journal
      It is still too much work for me to have to set up a new email address every time I leave it on a website somewhere.

      With an advanced spam filter, you set it up and forget about it...sometimes checking your spamfolder if there are any false positives.

      How do you create new email addresses? Do you have a CGI script interfaced with your alias file or so to easily make new email addresses? That would be useful.

      For me it still is too much work to set up email addresses that way. And you need to start doing this from the beginning, otherwise there will still be an amount of spam that gets sent to your username@example.com address (as is the case with me).

      Cheers,

      Costyn.
    • FALSE STATEMENTS (Score:4, Insightful)

      by mgkimsal2 ( 200677 ) on Thursday November 21, 2002 @08:42AM (#4722143) Homepage
      ... and I receive zero spam

      Once I receive spam on one of the addresses...

      I also advertise the email address widely ...

      So, you receive no spam, but when you do receive spam, you edit procmail. Which is it?

      Also, you widely advertise your email address, but you don't actually use your email address, but made-up aliases. Which is it?

      You're simply masking the problem, and going thru a moderate amount of gyrations (which most average joe 'net users won't/can't go through) to do so.
  • Copyright (Score:2, Insightful)

    by rockdreamer ( 238520 )
    Spam, like all written text is subject to copyright

    Couldn't the spammers sue for copyright infringement?
  • Are they legit? (Score:5, Informative)

    by Zocalo ( 252965 ) on Thursday November 21, 2002 @06:41AM (#4721769) Homepage
    Typical of a Slashdor story. Lot's of people asking questions when they can find out the answer and post it in the same amount of time.

    According to WHOIS, "spamarchive.org" was registered by one Guru Rajan, who has an email address at "ciphertrust.com". Also according to WHOIS, "ciphertrust.com" has the same person as technical contact and if you check the website you find they are the vendors of "IronMail: The Secure Internet Email Gateway", an established if not well known product.

    In short, yes, it seem legit, and it probably took me less time to find that out than the time taken by the myriad people asking "is it legit" took to post the question. ;)

  • by rakerman ( 409507 ) on Thursday November 21, 2002 @06:53AM (#4721783) Homepage Journal
    They've got gazillions of messages sent to uce@ftc.gov

    Why not just make that available to the public for creating training sets for spam?

    The idea of a central archive is good, but I don't see why there's a need to reinvent a New! Improved! wheel.
  • I don't see how this can work. Sure, hard drives get cheaper all the time, but how can they possibly afford to keep up with a wide open "send us spam" request? They'd need petabytes of storage.
  • by beaviz ( 314065 )
    This reminds me of an idea that i've had for som etime.. spamnewsreportingforthemasses.com - A news site reporting news from spam-sources - sort of like a satirical view on spam.
    "New indian health care enables you to have more lovers"
    "New solution for your economical problems found"

    - and throw in a hoax section too...
  • For profit? (Score:2, Informative)

    by alech ( 208219 )
    The domain is registered to Guru Rajan of ciphertrust.com. Funnily enough, Ciphertrust markets a product called IronMail that does (among other things) spam detection. So who says they are really putting the database out once they have it and not use it for their own good?
  • Resistant Strains? (Score:3, Interesting)

    by Queuetue ( 156269 ) <queuetue&gmail,com> on Thursday November 21, 2002 @07:31AM (#4721883) Homepage
    Although spam eradication is a good idea in general, I wonder if bulk training will only result in resistant strains of superspam developing, much like the v-cillin resistant staphs that are popping up lately.

    If we deal with a little spam by hand today, will that keep us from having to deal with undetectable spam later? I can imagine spam systems that probe you (using actual system probes of you and your contacts, marketing history and social engineering) to target spam that you may actually believe is a recommendation for the Sony(tm) handicam from your Uncle Bowser, or really is your wife asking you to pick up some Clorox(tm) brand bleach and fabric softener on the way home...

    Luckily, neither of them is likely to be sending information about my penis to me at work.

    Much like modding the Xbox (and thus giving MS the practice they need to harden Palladium), giving the hard fight to the spammers might just backfire on us.
  • by semprebon ( 61779 ) on Thursday November 21, 2002 @08:01AM (#4721943) Homepage
    I expect we'll next see Spammers using the DMCA to get their copyrighted SPAM removed from the database...
  • Are they going to offer the content of spamarchive.org under an Open Content license, or is this just another database that will eventually be absorbed and closed to the public by some corporation protecting database copyrights?

    --LP
  • You can find many of them listed from my spam archive [annexia.org] :-)

    Rich.

  • I don't thing that a large archive of spam is hard to come by. You don't need to publicly invite submissions either - just acquire a domain and hosting with catchall e-mail service, set up e-mail forwarding to an address for your database, then publish several addresses under that domain where spammers are bound to pick them up (newsgroups, FFA lists) and register them with services who sell their e-mail lists with a lot of different demographic information vectors. You'll get as much input as you have a use for.

    For calibrating spam filters you'll probably only want spam from the last few months as spam does evolve - e.g. it's mostly herb*l vi*gra these days.

    What is at least equally needful but much more hard to come by is a large, representative collection of legitimate e-mail, to test spam filters for false positives. This collection would need to cover diverse languages, cultures and contexts (private, business/x-industry, business/y-industry, system error messages, automatic notification messages etc.)

    What is hard about this collection of legitimate e-mail is that the privacy of both sender and recipient is affected, and that, if confidential information is masked or deleted, the e-mail isn't the original one and spam filters might evaluate it differently.

    There is one subset of legitimate e-mail available: public archives of mailing lists. But these e-mails don't cover the style of e-mail in other contexts.

  • by gatkinso ( 15975 ) on Thursday November 21, 2002 @09:10AM (#4722298)

    Get your now! You gate to betta rife. Moa pay, wok wess.

    www.dipwomas.tw
  • An "standard" archive of spam might work great for benchmarking rule based filters against each other, but adaptive filters, like the popular Bayesian kind, work best when they learn on your own emails and spams. There's also no point in testing an adaptive filter when you can't also feed it non-spam emails.
  • Ok... for the people that still use Outlook, this exact service is provided by a company called CloudMark. The address is Spamnet.com. I've been using it for some time and it seems pretty robust. A community basically earmarks spam messages and based on votes a piece of spam gets moved to a spam folder on retrieval. Nothing is ever deleted.
  • I've moderated a Usenet newsgroup that does this kind of stuff for the last six years now (since Nov 1996). (Yes, I know others have stated some of this stuff, but it's worth mentioning it again.)
  • Can the spam writers claim copyright infringement?

Avoid strange women and temporary variables.

Working...