Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Spam

TarProxy Creates Tar Pit... For Spammers 164

agravaine writes "I ran across TarProxy, which, IMHO, is one of the cleverest spammer-handling ideas I've seen yet. The gist: Early detection of incoming spam [using the statistical techniques pioneered on the client side] could be used to create an artificial scarcity of bandwidth experienced only by spammers." This project hasn't gone very far yet, but essentially is slows SMTP requests to suspected spammers. If this really works, and is installed on enough of the net, it could work. 144 spam so far today. Anything would be an improvement. CT Yup, it's a dupe. There wasn't anything better to post at 9am on a sunday, so you can just bitch about me instead ;)
This discussion has been archived. No new comments can be posted.

TarProxy Creates Tar Pit... For Spammers

Comments Filter:
  • Deja Vu (Score:4, Informative)

    by BorgDrone ( 64343 ) on Sunday March 02, 2003 @10:17AM (#5418282) Homepage
    Anyone else get a bit of a deja vu feeling reading this newspost ?
    • Re:Deja Vu (Score:5, Funny)

      by baptiste ( 256004 ) <mike@baptis[ ]us ['te.' in gap]> on Sunday March 02, 2003 @10:26AM (#5418318) Homepage Journal
      Maybe we can use statistical techniques on the client side to help the editors avoid duplicate stories.
      • I know you're joking, but you're right!

        I propose a large makeover to the Articles.pl script. Firstly, each article will have a dropdown menu that every member can use to vote 1-10 on each story. The top 30% of all stories will appear on the front page.

        Each article will also have an option at the very bottom called 'Duplicate' that, if checked off enough, will totally _delete_ the article from the database. We don't need to be having the same discussions to the same story 2, 3, and in some cases even 4 times. It's really getting ridiculous.
    • Obviously TarProxy is slowing down Slashdot as part of its collateral damage.
    • by Dark Lord Seth ( 584963 ) on Sunday March 02, 2003 @11:38AM (#5418545) Journal

      TarEditor!

      A revolutionary new system designed to keep your editors from posting duplicate stories! This new systems works by a punishment system, ranging from mild electrostatic shocks to gushes of boiling sulphuric acid! When properly applied, this piece of software will introduce a new era in the world of online journalism, having a significant effect on most long-term strategies and pleasing the stock market at the same time! This is all you're going to need in our post-modern capitalist system for guaranteed success!

      Features include:

      • Compatible with all editors!
      • Name based modifier system! Want Taco to get kicked in the nuts for the slightest mistake? NO PROBLEM!
      • HTTP proxy supporting HTML 4.0, 4.1 (all forms) XML, XHTML, DHTML AND MORE!
      • Punishments ranging from painful poke in the side to obliberating one's kneecap with a sledgehammer!
      • Up to date XML support!
      • 3 year on site support!

      TarEditor is copright (C) SethSoft, 1998-2003. All rights reserved. This work is protected under dutch copyright laws and the DMCA so I can sue some random people into debt from time to time.

      • On the topic of Slashdot duplication, I recommend a wiki-like [c2.com] solution. Just think of writing articles as programming, when you refactor your code, you're forced to edit it manually (there is no way around this). Sometimes you're even forced to change the structure of the program you're writing so it doesn't become a kludge full of duplicates. Right? In a way, Slashdot doesn't allow you to change its structure (not too much anyway). Personally, I would like to see Slashdot experiment with a small (space-limited) wiki section near the top of every news story. Then, readers and editors alike, could simply fold back and summarize the knowledge gained in the hundreds of articles gained below. Then should this experiment prove successful -- perhaps we could even consider using a similar kind of solution for tackling the bigger problem of duplicate _news_headlines_.

        On the topic of email spam, please take a look at my signature. It's a good practical elegant partial solution to the problem. And it's free (it's actually non-profit volunteer effort).

    • There is a new project called TarTroller. It doesn't stop trolling but it slows posts from suspected Trollers and gathers them into a known trolling profile. Basically, a loop is created by posting the same story often on Slashdot which disorients the trolls and causes them to focus on the duplicate "mistake" instead of attacking other aspects of Slashdot. As a benefit, those trollers who take the bait enter into lengthy threads with other trollers on different ways of poking fun at the /. editors. Finally, troll posts become very easy to spot and habitual trollers easy to identify and ignore. If this really works, and is carried out enough on Slashdott, it could work. 144,000 trolls so far today. Anything would be an improvement.
    • Maybe we can use statistical techniques on the client side to help the editors avoid duplicate stories. TWB Yup, it's a dupe. There wasn't anything better to post at 2pm on a sunday, so you can just bitch about me instead ;)
    • CT Yup, it's a dupe. There wasn't anything better to post at 9am on a sunday, so you can just bitch about me instead ;)

      It's necessary for CT to post a story out her to get people to bitch about him?

  • Dup (Score:1, Redundant)

    by baptiste ( 256004 )
    Read about it a couple days ago!
  • Duplicate Slashdot Story. [slashdot.org]

    How 'bout some changes to Slashcode so that story submissions containing URLs from past stories are flagged as "HEY, PROLLY A DUPE!" to the editors...
    • How 'bout some changes to Slashcode so that story submissions containing URLs from past stories are flagged as "HEY, PROLLY A DUPE!" to the editors...


      Because then Taco would never get anything posted.
    • How 'bout some changes to Slashcode so that story submissions containing URLs from past stories are flagged as "HEY, PROLLY A DUPE!" to the editors...

      So submit a patch already. Great chance to show off your coding skills.

    • A dupe is fine as long as it does not happen too often too soon. We need a perl script to scan for dupes, these lazy editors hardly have time to read.
  • I came across another similar project on Friday. I think a have a link to it somewhere...

    Ah, here it is. Using Statistics to Cause Spammers Pain [slashdot.org]. Posted on some website named slashdot.org. It also called itself TarProxy and .... oh ... uh .... never mind.

    • You know that site with the story?
      Is their pig logo made from the contents of a tin of spam?
      I never noticed that
    • I mean, yeah, it's amusing to catch the editors making a mistake, but by now I find the people snidely pointing this out to be only about 100x as annoying as the occasional duplication. In fact, I think the editors shouldn't change a thing about their process - occasional duplication can even be good... as for instance in this case, where I apparently missed the story the first time, and am glad I got a 2nd chance to learn about this.

      This is big news, after all - I'm not totally informed about the field, but to me this sounds like the first anti-spam measure which has the potential to be very effective without any "false positive" loss. The more people who see this story, the more people that will (like me) suddenly decide it is a great idea to go check this project out for installation on all their mailservers.
  • I don't see how effective this could be. How long before spammers get smart and set their SMTP program to give up after X seconds?

    Telemarketers killed the Telezapper, they would do the same here, its just a junk-busting arms race.
    • If the proxy was linked into the TCP/IP protocol stack it could not just slow down the connection but NAC ,say, 90% of the packets from the spammer or open relay.
    • No problem. It means X seconds in which they do not send another message, and no meaasge sent through that SMTP gateway. With enough mailservers doing this, it will severely limit the number of messages they can send in a given time.
      • So explain to me why slowing down your SMTP connection won't also slow down YOUR mail, especially with single access clients like OE.
      • Additionally, almost all spam goes through an open relay -- the spammers almost never talk directly to the final mailserver. So TarProxy isn't hurting the spammers so much as the open relay sysadmins. The open relay sysadmins, seeing their mail servers slow down and run out of disk space, will either take the time to figure out what's going on (and hopefully solve the problem by securing their server), or do nothing and have their server hammered to the point where it can barely spam anymore.
    • I don't see how effective this could be. How long before spammers get smart and set their SMTP program to give up after X seconds?

      Exactly! Sendmail, for example, allows a configuration to get around this all together. If the connection takes too long, say 5 seconds or so, then drop back and punt to your fallback server to deliver at a more leisurely pace. Meanwhile deliver another thousand messages to all of the other sites that don't use this tech.

      Don't forget that even without optimization any mailer is capable of handling hundreds of simultaneous connections. All this does is tie up resources on your own machine while the spammer delivers to someone else. You will eventually get a delivery if you accept the connection. If you get a lot of spam, all you do is DoS youself while you NAK all of those packets.

      As for tying up spammer resources, how many ISP's are planning to use this? Without ISP buyin, spammers will continue to vomit out garbage toward the low-hanging fruit.

      I think that a better solution would be to RST the connection as a result of the stats and blacklist the IP for a few hours. When they retry they get a 550 or something akin, and you don't get DoSed. That is, if you're going down that road at all.
      • I think I see things a bit differently. In my case, incoming mail is already classified. It's tagged with an X-Spam-Status/X-Bogosity header, and let through. My users can setup filters in their clients or Sieve rules in the server using that mark (not all of them do, just the ones that care about this stuff --but that's about 50% of them).

        Now that's about as far as I can go, wrt SPAM filtering. I can't block SPAM, because no filter is perfect. I know I'll get a couple of false positives every month, and I just can't afford to block anybody if I think there's the slightest possibility that the filter could drop a message that one of my users really needed to see. They can trust the filter and setup a delete rule (as a matter of fact, a couple of them do precisely that, despite my warnings), but that's their choice, so it's ok. I won't block anyone's messages, and I won't risk losing a single message that's not for my own inbox.

        Anyway, this thing allows me to make a slight modification in my setup so that all that filtering work also hits the spammers. Just a little. That's not going to make any practical difference to me or my users, not immediately at least. But I believe it will be good for the Internet, in the long run, so I think I'm going to implement it (just not with a filter in Java!).

        Exactly! Sendmail, for example, allows a configuration to get around this all together. If the connection takes too long, say 5 seconds or so, then drop back and punt to your fallback server to deliver at a more leisurely pace. Meanwhile deliver another thousand messages to all of the other sites that don't use this tech.

        Fine. None of those sites is mine. And if you have that thousand extra addresses in your spamlist, you were going to deliver that thousand messages anyway. It's not like what I do is making you hit the net any harder. And actually, if you do set up that drop back rule, then I won't get any more mail from you. Great, please go ahead!

        Don't forget that even without optimization any mailer is capable of handling hundreds of simultaneous connections. All this does is tie up resources on your own machine while the spammer delivers to someone else. You will eventually get a delivery if you accept the connection. If you get a lot of spam, all you do is DoS youself while you NAK all of those packets.

        Not exactly. It does tie up resources in my server, and also in the spammer's machine. Both he and myself have plenty of resources for this --maybe he has more than me. But I'm just dealing with him. He's dealing with my server, and a thousand more. If there's just ten more servers in that thousand playing him the same trick, then I have to take one tenth of the overhead that he'll have to take. I can do that. In fact, I want to do that.

        Also, don't forget that sending NAKs (ICMP chokes, in this case) is very cheap. I don't think I could ever hit my bandwidth noticeably by sending them. As for RSTs (in case he wants to open 8192 connections to my box, only to find out that I only allow four open connections per client)... Well, that scenario looks like a DoS attack, which is clearly illegal (whereas spamming is not), so they are risking themselves way more. And it is already very hard to take a host down with half-open SYN -- with plain SYN, I'd say it's impossible unless you have more bandwidth than me, and every other admin using this tactic, put together.

        As for tying up spammer resources, how many ISP's are planning to use this? Without ISP buyin, spammers will continue to vomit out garbage toward the low-hanging fruit.

        Very, very few, initially. ISPs, particularly the bigger ones, are mostly reactionary. They won't move a finger, until some important customer (or enough regular ones) comes complaining why are they receiving ten times more SPAM than her friend, that uses a competing ISP. Because that's what is going to happen if tarpits become popular: spammers will implement very short timeouts in their bulk mailers, so that tarpits don't waste their precious time -- leaving them blissfully alone. Then maybe some ISPs will get off their asses and find out what their competitors are doing that they are not. And just maybe, they'll fix it.

        I think that a better solution would be to RST the connection as a result of the stats and blacklist the IP for a few hours. When they retry they get a 550 or something akin, and you don't get DoSed. That is, if you're going down that road at all.

        I don't think so. That breaks rule #1: you can't lose a single message. You can't block SPAM. But you can make it just a bit more expensive to send; and if enough people does the same, then bulk mailing can become a very expensive proposition.

    • Plus, if they give up and drop the connection to your SMTP server, you don't get their Spam.

      The only downside I forsee could be trying to send email from work after your company was blacklisted for some pseudo-spam mailing they send out regularly to registered users.

      This type of blacklisting could hurt the same people existing blacklists do, but in a far more damaging way. And I don't think our methods for determining who gets blacklisted are based on some judicious process. This would give Spammers a means to fight back against this otherwise very cool system.
      • This type of blacklisting could hurt the same people existing blacklists do, but in a far more damaging way. And I don't think our methods for determining who gets blacklisted are based on some judicious process. This would give Spammers a means to fight back against this otherwise very cool system.

        But this is not blacklisting. This is dynamic teergrubing. If your MTA sends mine a couple of messages, one ham and one spam, the first will get through in half a second, whereas the other will take minutes (maybe hours, depending on its running bogosity). And I'm not going to remember your IP. If tomorrow you send me only ham, the service will be as fast as it can be. If you send me only spam, then you're in for a nice queue on your disks.

        And nevertheless, it really doesn't hurt as much as being RBL'ed. Your mail does come through in the end. If you're in the RBL, you're, well, blackholed.

    • Well, if they "get smart and give up after X seconds", then they won't be able to deliver mail to the tarpitted host, right? That's all we want, I suppose...
    • I don't see how effective this could be. How long before spammers get smart and set their SMTP program to give up after X seconds?


      This doesn't matter since most spammers use open relays to send their junk. They generaly don't have control over the timers for the relays they are using. The relay will be slowed down to a crawl making it less useful for them. Of course the spammer can get around this by running his own mailserver but this means he needs to invest a lot more money in bandwidth/hardware/upkeep etc. and he will make himself much more visible to the net.
      • He is the only one who understands the fact that tarpit attacks spam at any point in its propagation. At some point the spammer has to let a relay do its thing, and if the relay doesn't have the tarpit, then its quite possible that somewhere down the chain another relay does. See? See how it works?

        It's viral. Once a mailserver notices that alot of its mail isn't going through, they may take notice, and then once they find out why their mail isn't going through, they'll of course want to install the tarpit themselves to ensure that their mail goes through. The tarpit continues to propagate upstream to the spammer until he can barely send spam. It's brilliant.

    • How long before spammers get smart and set their SMTP program to give up after X seconds

      I hope it's soon, as then I can just make potential spammers wait X+1 seconds, slowing the 'dumb' ones down, but completely defeating the 'smart' ones. (Which makes you wonder who's smarter...)
  • Brilliant (Score:5, Insightful)

    by FunWithHeadlines ( 644929 ) on Sunday March 02, 2003 @10:23AM (#5418308) Homepage
    What's inefficient about current spam solutions?:

    "These classifiers already come in many forms. There are POP3 proxies, IMAP proxies, mail file processors, and even classifiers built directly into mail clients. I use PopFile (a naïve Bayesian classifier in a POP3 proxy) at home with great success. Some work better than others, but with a little training, they all seem to work pretty well. Unfortunately, they have a common shortcoming: They don't cause the spammers any pain."

    What is the goal?:

    "And we all want to cause spammers pain."

    How do they want to accomplish this pain?:

    "None of these classifiers are capable of causing the spammers any pain because the spammer is long gone by the time the classifier has the opportunity to process the message. What we need is a way to use the classifier against the spammer while the spammer is still connected."

    This is brilliant. If all you do is clean up after the spammers when they are long gone, there is little motivation for them to stop. So what if they've dumped a bunch of garbage in your in-box? They don't stick around to see you clean up. But this idea hits them while they are in the process of spamming you.

    That's the key: Make it harder/more unpleasant/less cost-effective for the spammers and you discourage them from spamming. Hit the source, not the results.
    ------

  • by TheAB ( 38019 ) on Sunday March 02, 2003 @10:27AM (#5418324)
    This is an interesting pro-active approach, but isnt most mail sent through open proxies, which have absent sysadmins? If we cant get them to lock down their mail servers, how can we get them to install this?
    • If we cant get them to lock down their mail servers, how can we get them to install this?

      We don't need them to install this. They are its intended victims. If you have an open relay, the spam comes from you. TarProxy doesn't cut off or blacklist, it just makes it expensive to send spam. These "absent sysadmins" are going to have to show up at work and lock themselves down. Then they will magically stop being penalised. I don't give a damn about the spam they recieve, just that which they forward.

  • What we need is a way to use the classifier against the spammer while the spammer is still connected.

    But the spammer him/her/itself isn't connected to your machine unless you run an open relay. And you don't run an open relay, do you? Do you?

    </HUMOR>

    (Don't bother with a snide explanation about how SMTP works or telling me to read the article. I did that. It's a joke.)

    • And you don't run an open relay, do you? Do you?

      Of course I don't. But don't tell that to the spammers, because I try to make them believe I do have an open delay. Then they can send their junk to my blackhole where it does no harm.
      • Does this actually work? It seems to me that, were I to take up spamming, the first thing I would do on encountering a seemingly open relay would be to try to use it to send a mail to myself... if I never recieved said message, I'd give up and go on to the next one? Then again, I suppose there could be some pretty stupid spammers out there... so, does it work?
    • Why would you designate someone as "Foe" just because you don't agree with one thing they said

      I marked the last foe because he said that anyone who doesn't modify config files for gnome and the window manager environments is just a drooling idiot anyway and doesn't deserve to use Linux.

      We don't need that kind of crap thinking in our community, and now he's been permanuked with a -6, never to appear on /. again AFAIC.
      • The last person who marked me as a foe did so because I wrote a journal entry suggesting that everything he wrote be moderated down, irrespective of content. No idea why the other one did, but since he posts stories about molesting small children, I'm glad he didn't mark me as a friend...
    • Why would you designate someone as "Foe" just because you don't agree with one thing they said...

      My foe list is full of trolls & folks that like to complain about duplicate stories. You're in it for this one [slashdot.org].
    • "And you don't run an open relay, do you? Do you?"

      I did, and it hurt spammers. I don't recommend it (because there's easier/better things to do) but you can run an open relay that's secure. That means it will accept relay email from anybody, including spammers, but only delivers the non-spam.

      But forget that. If you want to play that kind of game (I rather hope you do) run a system that never receives any valid email. The only email you'd ever want that system to deliver would be the messages the spammer sends to see if it is an open relay. Delivering the test message makes him conclude it is an open relay, but he concludes wrong.

      You can have great fun whacking the spammer, based on what you learn from the spam he sends and your logs, but the real goal is to have so many such systems that the spammer is in despair: he can't tell real open relays from fakes. Then what does he do?

      Of course exactly the same idea works for open proxies - run a fake open proxy, fool a spammer.

      If you run windows or a JVM under some other environment try out the Jackpot Mailswerver: http://jackpot.uk.net/ This enables you to deceive the spammer into thinking your Windows system is a mail relay.

      Heck, I trapped a relay test message just 8 minutes ago from axis.software.powerinternetcr.com [216.25.173.245]. If I had relay enabled in my Jackpot I'd probably see spam very soon.

      Interesting Spamhaus record: http://spamhaus.org/SBL/sbl.lasso?query=SBL5858
  • This, while a good idea, is not the ultimate solution. The only way we are really going to eliminate spam is to break into every box the spammers use and delete NOSMOKE.EXE. Only then, when their boxen go up in flames, will we be free of spam.
  • every isp (or a lot of them) should install this to get is working.
    as spammer i have a lot of time. let the computer do the work. if it takes a little longer to spam, who cares.

    and false positives...

    but the initiative is good.

    personally i'd like to call a spammer (of possible) to explain i don't give s sheit about hos offerings.
    many times it tames a while to get through. aparently i'm not the only one calling ;-)
  • looks a lot like this [sourceforge.net]
  • by sanermind ( 512885 ) on Sunday March 02, 2003 @10:33AM (#5418344)
    In the spirit of repetition...

    Easy to defeat, just use spamming software that dynamically increases it's connection pool whenever it encounters a 'slow' SMTP recipient. Even if a large part of the net population were running this, the spammer could just spawn thousands of simultanious (slowed down, yes) connections, and still maximize his bandwidth utilization. If it takes 2 minutes to send each message, it dosen't matter if he's sending 5000 messages at once!

    I believe linux, for example, allows up to 8192 open sockets, and I think this can be changes with a sysctl command, and most definitely could be with a few changes to kernel headers.

    Sure, it would take a machine with decent memory, but that's not too hard to find.
    • Nope. The point is not just to slow it down to a crawl, but to never actually send that message at all (or at least append a tag to it identifying it as highly probable spam).
    • Keep in mind that spammers are typically bottlenecked through the open relays that they exploit, and they don't have much granularity of control on those, since they're only connected through SMTP.

      So their only options for opening up many connections are to find other open relays, but that's a bit more difficult a propsect, I'd imagine.
    • by Featureless ( 599963 ) on Sunday March 02, 2003 @11:35AM (#5418534) Journal
      Hmm... This calls for some TCP geekhood and some strong math. I am way too hung over for math. Let's just talk about it in broad strokes first.

      If I'm tuning this package, I can make these delays REAL big. I mean, email is one of those systems where a false positive resulting in even a... let's say an **8 hour** delay to a legitimate message would still be considered perfectly fine for most purposes. There's fuzzy logic in play here; I'm thinking not all delays will be equal. But what if you were just really harsh on suspected spam? Not such a loss IMO. Of course... I haven't considered that you will have increased reliability problems trying to hold a stream open for 8 hours, but remember, a legitimate mailserver will keep resending, and as we go bayesian on servers, perhaps we will learn to resend for a little longer as well? Or perhaps there is another protocol solution (i.e. letting the sender know they're being delayed for spam... so perhaps giving them the option to reformulate their message and resend?) Let's just press on. The precise amount of the delay may not necessarily be important.

      If I'm sending 50 million messages (a modest spammer's run, if I'm well enough informed) and each one holds me on the line for 8 hours that means 400 million hours if run serially. At the 8192 concurrent thread barrier that's still almost 50,000 hours (~5 years)... with mathematical convenience, to do this entire run in 8 hours you will require **50 million** concurrent threads? Or should I have just stayed in bed longer?

      Now it's looking like the exact length of the delays, and the exact number of concurrent threads is not actually something worth too much niggling debate. We just have to get familiar with the orders of magnitude we're dealing with.

      Consider the protocol-to-data ratio of an SMTP transaction over TCP alone. How much is data and how much is just protocol overhead in a given mail transaction? We can figure this out down to the last bit, but I'm going to just throw out the hypothetical notion that when you have to initiate a new SMTP transaction for every message you send, the bandwidth overhead for doing this millions of times is not inconsiderable.

      And we have to think of the other end. Spammers may write themselves custom TCP/IP stacks, but receivers certainly will not. Consider AOL. AOL encompasses some significant percentage of your list of victims. What is AOL going to do with anywhere _near_ that many simultaneous connections... ***from just one spammer?*** Why, call the FBI, of course! It's a DOS attack!

      I'll stop now. I wouldn't be surprised if there were other angles on this I haven't considered. But at first blush it doesn't seem nearly as easy to beat as you suggest.

      Perversely I think the biggest danger in this technique is that it may become widespread and then force spammers to really confront Bayesian filtering head-on. Of course, just thinking aloud (and this probably is undoable for privacy reasons, but just to open a line of speculation) you can do some interesting things with these kinds of filters... retain lists of email addresses that you've received mail from (and/or replied to) more than once... they get a lower score (and a lower delay) than first-time senders... etc. etc. So it's not clear even with very well-designed spam (another cost increase for spammers!) that you could win against the filter.
    • Each open connection takes memory. A little effort can run their system so heavily into swap and get the load average high enough to throttle their mailserver.
  • OpenBSD's PF? (Score:3, Insightful)

    by grub ( 11606 ) <slashdot@grub.net> on Sunday March 02, 2003 @10:37AM (#5418349) Homepage Journal

    OpenBSD has a (alpha? beta? alpha hydroxy? I dunno) anti-relay addition to the PF firewall. Theo first mentioned it here [deadly.org] and it was carried the story here. [slashdot.org] It sounds similar in that it puts the onus of time and bandwidth waste back on the spammers.
  • I actually tried one anti-spam tool, and it worked very well (it was on my Windows box at work), but I couldn't help but worry that maybe I would miss an email the was marked spam but really wasn't.

    And, I work really hard at my job because the economy's so bad. Good for bid that spam software would delete an email from my boss just because he used a certain popular spam token in the subject header.
    • Something like that happened to me. I was at a customer site in a room with no phone (and no cell phone reception). My laptop was hooked up to the network so I could send/receive email. My boss sent me a me a message with the subject Call me ASAP!! (insert many, many exclamation marks here). The spam filter dutifully moved the message to my UCE folder.

      One of the advantages of the tarproxy technique is that it doesn't competely block the suspected spam. The message will eventually get delivered, at which point your mail client can filter it if you like. Since my boss' email came from inside my organization I think you could configure tarproxy to skip processing such messages.

      As for my boss, I added his address to my known-senders list and told him to lighten up on the puctuation.

  • ...for "duplicates"? Seriously, how hard would it be to make a script to check the URLs, to see if they've been posted before, and then automatically compare and/or dump them (maybe a few could be useful for "revisited" stories, but 99% not). [Flame to make it happen] Or is scripting that too difficult in Linux [/Flame]

    Kjella
    • by Anonymous Coward
      You yourself are 'duping', really... take note:

      Deja Vu (Score:3, Informative)
      by BorgDrone (64343) on Sunday March 02, @09:17AM (#5418282)
      (http://slashdot.org/) Anyone else get a bit of a deja vu feeling reading this newspost ? [ Reply to This ]

      Re:Deja Vu (Score:5, Funny)
      by baptiste (256004) on Sunday March 02, @09:26AM (#5418318)
      (http://baptiste.us/ | Last Journal: Monday April 01, @11:27AM)
      Maybe we can use statistical techniques on the client side to help the editors avoid duplicate stories.
      [ Reply to This | Parent ]
      o Re:Deja Vu by $$$exy Gwen Stefani (Score:1) Sunday March 02, @09:44AM
      * Re:Deja Vu by AndroidCat (Score:1) Sunday March 02, @09:55AM
      * 3 replies beneath your current threshold.

      Dup (Score:1, Redundant)
      by baptiste (256004) on Sunday March 02, @09:17AM (#5418284)
      (http://baptiste.us/ | Last Journal: Monday April 01, @11:27AM) Read about it a couple days ago! [ Reply to This ] Yet Another... (Score:2)
      by Motherfucking Shit (636021) on Sunday March 02, @09:19AM (#5418289) Duplicate Slashdot Story. [slashdot.org]

      How 'bout some changes to Slashcode so that story submissions containing URLs from past stories are flagged as "HEY, PROLLY A DUPE!" to the editors... [ Reply to This ]

      * Re:Yet Another... by bwalling (Score:1) Sunday March 02, @09:36AM
      * Re:Yet Another... by crawling_chaos (Score:2) Sunday March 02, @09:53AM

      Another similar project... (Score:5, Funny)
      by jdreed1024 (443938) on Sunday March 02, @09:21AM (#5418299) I came across another similar project on Friday. I think a have a link to it somewhere...

      Ah, here it is. Using Statistics to Cause Spammers Pain [slashdot.org]. Posted on some website named slashdot.org. It also called itself TarProxy and .... oh ... uh .... never mind.

      [ Reply to This ]

      * Re:Another similar project... by brejc8 (Score:2) Sunday March 02, @09:47AM
      * God, why do people care about duplicate stories? by Featureless (Score:2) Sunday March 02, @09:57AM
      o Re:God, why do people care about duplicate stories by cowmix (Score:2) Sunday March 02, @10:27AM
      * 1 reply beneath your current threshold.

      How hard would it be to read through the first four or five comments to realize that your .02 isn't worth posting...
  • TarSlash (Score:1, Redundant)

    by gmuslera ( 3436 )

    This new program makes a breakthrough in artificial intelligence technology.

    Giving the actual threat of "Slashdotting" (TM), the the high rate of internet use, and the geometrical increase of Slashdot hard disk usage, this kind of advance is a God-given gift.

    Essentially, it slow downs news sites posting duplicates articles, slowing down the comment rate, and the people that follow links or make a big hit in remote site performance have now little performance impact (the ones that still follow the link have in their cache most of the remote site content already).

    "Its wonderful" a webmaster says... "wish that more high traffic news sites use that technology. The risk that a site with a cool technology will stop answering requests has clearly decreased since the introduction of TarSlash"

    In related news, Slashdot administrators have found that news posting has increaed the compressed ratio in the archives. There are several theories that could explain that, maybe english is evolving and this last times fewer different words are used, increasing compress ratio, or articles language is being normalized, so postings are slowly being more similar each time.

  • ...and is installed on enough of the net...

    therein lies the problem. pretty much all our problems could be solved if a majority of ppl _______ (weren't careless/used _____ software/didn't click on emails titled "I Love You"/etc.). but that's the problem.

    getting a majority of anyone to do anything is a major accomplishment in and of itself. wishful thinking.

  • ive been using CloudMark [cloudmark.com] for Outlook. I actually read about it in a /. post. Windows users, check it out. It's free
  • by veeoh ( 444683 )
    Mdaemon (www.altn.com) has been doing this for ages....
  • Previously, on Slashdot...

    Tarpits for Microsoft Worms" [slashdot.org]

  • by httptech ( 5553 ) on Sunday March 02, 2003 @10:46AM (#5418383) Homepage
    Most spammers don't send hundreds of thousands of emails from their own connection. Generally they use open relays to propagate a few messages each with a huge RCPT list. So, tarpitting does nothing to the spammer directly. However, tarpitting the open relay does accomplish something - it could cause a huge backlog of mail, eventually letting the relay choke off its own resources as spammers kept trying to dump messages on it. This would cause some indirect pain to the spammers, as finding open relays that could deliver mail quickly would be difficult. It might also alert the mail administrator to the problem, thus encouraging them to close their server to relaying.

    And you would not need to roll this out on most of the net. If the large ISP and webmail providers started doing this it would have a significant impact. Much of the spammer's distribution list consists of a few domains; yahoo, hotmail, aol, etc. If the large providers implemented tarpits it could quickly damage the ready supply of open relays for spammers.

  • There was some work done on the news server side to do the same thing based on a MD5 hash of the article ID. It's an interesting idea that is in the class ofr a firewall - it's not going to stop the idiots, but it's going to slow them down enough that you've got a chance to block them via other means. The really neat part would be to write the log file analyzer that can show the backed-off connections (and rate at which they've slowed) in a useful format.
  • by rnt ( 31403 ) on Sunday March 02, 2003 @10:50AM (#5418392)
    Not exactly the same thing as the article is about, but still related: My mailserver is properly secured and refuses to relay anything except legitimate mail (i.e. it will accept incoming mail for users on the domains it serves and it only relays mail to the outside world when it's from a predefined set of internal machines). There are plenty of spammers trying to convince my mailserver to send their spam to other people, but all get a nice "relaying denied" message and a couple of lines in my maillog.

    I think it's a safe bet all relaying attempts originating from the outside of my network are spammers. The information in the maillog about denied relaying attempts should give an accurate list of IP-numbers used by spammers.

    Doesn't this give some interesting opportunities?
    Creating spamtrap daemons that listen on servers that aren't mailservers (so the fact the behave similar to a real mailserver and listen to the same TCP port is just a coincidence). Those server should be unlisted, not have any DNS records pointing at them being MX for any domains, etc.
    The only way to find them should be be randomly scanning an IP range.
    In that case the only people using them would be spammers trying to abuse random mailservers and it would be pretty safe to have the fake mailserver pretend to accept the mail, wait a while, try to gobble up some resources of the spammer, and finally dumping the spam-attempt to /dev/null and telling the spammer what he or she wants to hear: I have delivered your junk. The logs would prove useful, the spam is prevented. Happy happy, joy joy.

    The biggest disadvantage would be that such a fake relaying server would probably trigger some of the open-relay scanners (although the clueful scanners would wait until a message is actually received). Hmmm, spammers could do the same, really probing a mailrelay before trying to use it...

    Anyway, it would cost spammers more and more effort and probably annoy the hell out of them, which is a Good Thing.
  • now that they have been /.'ed -twice- ;-)
  • by wackybrit ( 321117 ) on Sunday March 02, 2003 @11:13AM (#5418473) Homepage Journal
    Every time they post a dupe their bandwidth to Slashdot gets cut by half.

    After a few dupes they end up with 5 minutes between server requests, which gives them ample time to check whether it was actually a dupe or not.

    Et voila, a Slashdot that only gets 2 posts a day, but at least they're not dupes.
  • Okay, I think you've got what to do down - this is a great idea. The problem is, when to use it?

    Here's what I propose: setup a large number of bogus email accounts. Broadcast them everywhere, and let them be honey-pots for spam. The point is, since you NEVER use this account for anything but dropping in spammable places, anything you receive on it *must* be spam. As soon as you get a connection from a mail server to one of these addresses, you *know* it's an open relay, and you put it in your database -- automatically, with no interaction required.

    Step 2: You also do a "fingerprint" on the spam you get in your honeypot (you know the routine - what's the length, average use of the word "dildo", etc) so that you can identify this particular spam "copy" by the message -- NOT the header. This allows you to automatically filter out spam messages. If the spammers want to adapt, they have to rewrite their copy. As long as your signature algorithm is fairly lose -- that is, not a true hash algorithm -- they should have to do a total rewrite if they don't want to be detected. You can then filter these at the relays. Thus, once again, you raise the cost for them to do their spam. Since you are filtering by actual known-spam content -- that is, you're doing this like they do virus signatures -- you should get virtually no false positives.

    And, anybody whose friends who are emailing them about penis enlargement doesn't really deserve email anyway.

    Anyway, there's step 1 and 2. To summarize:

    1. Lag spammers.
    2. Filter spammers.
    3. ????
    4. Profit - and make sure to send me some.
  • Can't handle POP (Score:3, Interesting)

    by DuckWing ( 19575 ) on Sunday March 02, 2003 @11:23AM (#5418499)
    While this is a very cool idea and works if you run your *own* mail server, it doesn't do any good for those of us that grab our mail from ISP's and use POP (or some other protocol). It means we have to convince our ISP to use this product/concept, which in my case (cable company) is impossible since they are a bunch of twits anyway.


  • If this really works, and is installed on enough of the net, it could work. 144 spam so far today. Anything would be an improvement.

    sounds like wishful thinking. I get nearly 300 messages a day to my Hotmail and other free accounts. You can only use them for a few months before they fill with spam every day.

  • Ok, why do we need 300 people to reply with a one line message saying 'dupe'.
  • Taco - why dont you read your own site any more? Do you surf somewhere else these days? Let us in on the secret and maybe we could come and join you.
  • Someone correct me if I'm wrong here (my math is not spectacular) but as I understand Bayesian techniques, a whole message is tokenized and then the fifteen or so "most interesting" tokens -- as compared to the spam corpus -- are analyzed to come up with a probability the given message is or isn't spam. If TarProxy is creating tokens and then analyzing them as they arrive, how much message has to be received before a spam identification is made? Will it work poorly with a small corpus? The first thing to arrive are the headers which are tokenized along with the message body. Lots of spam comes from Yahoo, and so does traffice from friends -- wouldn't classification based purely on header tokens tend to call all Yahoo mail spam under this scheme? I hope someone with a background in the math this is based on can make a comment.
  • Important Stuff:
    Read other people's messages before posting your own to avoid simply duplicating what has already been said [slashdot.org].


    I noticed some of the moderation done within this thread. What a waste. And how ridiculous. When an original story post is a dupe how can any reply to it ever be considered off-topic or redundant?? Think about it. Any moderation within this thread is a waste of mod points.

    There's nothing to see here, move along, it's just another typical CmdrTaco post [slashdot.org].

    Man, the dude is embarrassing. He obviously never reads /.
  • geeze doesn't this guy read his own website
  • The original post I believe had a talk about POPFile as well, which is a client proxy that you run your email through. I am pretty sure it is the same guy that made both. It filters out all the crap and "learns" as it goes. Since I installed it I have managed to take my daily 100 spam emails down to about 2 or 3 tops and I am still training it.

    Download it here [sourceforge.net].

  • "I don't see how effective this could be. How long before spammers get smart and set their SMTP program to give up after X seconds?

    Who cares?!? Let them give up after x seconds. I want them to give up and stop sending my server spam. Thats just fine by me. If they give up I don't get any spam. How more perfect can you get?

  • by Multics ( 45254 ) on Sunday March 02, 2003 @12:53PM (#5418850) Journal
    Given:

    1) the /. creaters are by-in-large the /. people that control the posting of stories

    2) Most stories contain at least one URL

    3) URL's, by in large, are unique

    Then;

    Would it be so hard to modify the actual posting code to check that the URL hadn't already been part of a story header within say the last 60 days?

    Such a check would help both /. and all others that run / code.

    Just a thought!

    -- Multics

    • Finally someone who isn't just whining but is actually suggesting a solution!
      I say we MOD YOU UP!
      I don't have any moderator points right now, but if I did, I would mod you up.
      Respectfully.
  • Is it the case that people deliberately submit duplications to see if it'll actually be posted?
  • I don't mean to start a petty 'War of the Languages,' but I pretty much stopped reading this article when I read that this program was written in Java. Our mail servers are slammed enough as it is, just trying to route incoming mail and using SpamAssassin is pushing six machines close to the breaking point. Why on earth would I want to even think about throwing JVM into the mix? An interesting idea, but I think I'll wait until we see something in C....
  • This is crazy, how many times do I have to read the same article on Slashdot. I might actually have to stop reading /.

    But then my life would be so empty. NOOOO!!! My preciousssss! I will continue.

    Suhit
  • Check out:

    Exim SpamAssassin at SMTP time
    http://marc.merlins.org/linux/exim/sa.html
  • by karl.auerbach ( 157250 ) on Sunday March 02, 2003 @01:33PM (#5419033) Homepage
    I suggested a similar mechanism to constipate TCP connections on the IETF e-mail list last summer. The basic idea is to add some new calls to the TCP API so that an application could peek at the incoming traffic without it being acknolwledged at the TCP level. If the incoming stream were something bad, then the application could tell the TCP stack to go into a slow acknowledgement mode, thus capturing the spammer in slow-mode transfer.

    For more, see http://www1.ietf.org/mail-archive/ietf/Current/msg 17009.html [ietf.org]

    The difficulty is getting enough of these deployed so that spammers, and open relays, have a good chance of getting stuck.
    • Good idea! (Assuming that the security implications are considered carefully.)

      As for getting it deployed, perhaps the tarpitting could just be made a part of the major unix SMTP code and distributed with it turned on by default. Similarly add and distribute kernels with the added tcp/ip api calls. With comments to that effect somewhere and flags to turn it all off.

      Then, anyone who is actually maintaining an open relay using unix will eventually get the "enhanced" version.

      It would also seem reasonable to encourage people to run open relays with this in place where they had been previously not open - it would increase the number of hosts for the spammers to use, granted, but would end up slowing down the spammers. If the code also had ways to track the originating hosts these could be shared and used to augment host checking in the tarpit.

      If at the same time some crypto-based signing mechanism were added to the SMTP code, this could be used as a "pass through normally" flag. Non signed messages could be slowed down on receipt.

      And, of course the "yellow light" is a reference to Reverend Jim taking his taxi driver license exam in "Taxi".

  • "If this really works... it could work."

    You don't say. I would think that if it really worked it wouldn't work!

  • I don't get any spam, either on my work account or my att account, so whats all the fuss about. Don't give out your email to people sites that you don't trust. I have a yahoo account that i use for websites etc, that is simply there to collect spam. Yahoo even coureously deletes the spam for me after 7 days!
  • This isn't new - someone else came up with the same idea a couple of years ago. We've got our outside systems configured to allow connections to port 25, then hold the connections open for 30 econds to a minute per response ... then drop the email on the floor.
  • It does nothing for people who still have to pick up their mail via POP3. By the time their computer picks up the mail the connection is long-gone.
  • This is a step in the right direction, but I don't quite agree with it entirely. You've identified it as most likely being spam, but are still sending it, just slowly. What I'd like to see (and what I think other programs do; I believe they're called "teergrubbers" or something to that effect) is something that incorporates the same "make the spammers wait" concept, but doesn't actually relay the mail. They try to send 10,000 messages through your server, one at a time, waiting several minutes between each one, only to not get any sent at all. This is a really bizarre viewpoint, and I'm sure it wouldn't work, but in a sense, if you run this, you're an 'accomplice' to the spamming: You ID it as spam, but still pass it through. Why not go all out and not even relay it?
  • CT Yup, it's a dupe. There wasn't anything better to post at 9am on a sunday, so you can just bitch about me instead ;)

    Finally, we're all on-topic! Of course, you all know about the dupe posts, so there's no use in it for me.

"If it ain't broke, don't fix it." - Bert Lantz

Working...