How to Get Rid of Referrer Spam? 56
wikinerd asks: "I have recently opened my own community website. Everything was fine until spammers found it, which happened quite quickly. As usual they filled up my mailboxes, but SpamAssassin can take care of that when it is needed. Then, they discovered my blog and my wikis and employed their bots to fill them up with spam comments. I solved this problem by moderating all comments. Now, however, they employed another evil trick: Referrer spam. They caused my webserver statistics to grow up by orders of magnitude by making their stupid websites to show up on my referrer lists. Unfortunately now my webserver usage statistics are full of viagra, poker, casino, porn, spyware, and pharmacy sites. I am afraid that this is a problem I cannot solve with the knowledge and the tools I have at the moment. So, I came here to ask Slashdot readers: How can I fight referrer spam and what tools are available in a GNU/Linux environment to ensure clean and spam-free usage statistics?"
Here's how for Apache (Score:5, Informative)
First, you need to setup the log you'll use for statistics to exclude requests marked with a "nolog" environment variable.
CustomLog logs/access_log-www.example.com combined env=!badreferer
The following requires Apache's SetEnvIf module. You can put these lines in
#Blacklist (adjust as you need)
SetEnvIfNoCase Referer ".*(credit|hold-em|holdem|mortgage|money|cash|gb.
#Whitelist (optional)
SetEnvIfNoCase Referer ".*(google|yahoo|alltheweb|search|excite|aol.com|
Additionally, you can use the same blocks to deny them access to your site:
<Limit GET HEAD POST>
Order Allow,Deny
Allow from All
Deny from badreferer
</Limit>
<LimitExcept GET HEAD POST>
Order Deny,Allow
Deny from All
</LimitExcept>
Re:Here's how for Apache (Score:5, Informative)
You could even try configuring your software to use such blacklists to deny trojaned machines access completely.
Additionally, if you wanted, you could then add those IP addresses to your firewall rules to drop the requests at the firewall.
Lastly, you could teergrub them - set things up to...
Respond...
Very...
Slowly...
To...
Their...
Request...
Re:Here's how for Apache (Score:2)
Which is why you contact the ISP of the originating connection - to get them to clean up their act.
And if they are unwilling or unable to do so - are you really losing (for the
And you can also look for the proxied-for headers, and use them to further refine your lists.
Sample, please? (Score:2)
Re:Sample, please? (Score:2)
Here's a series of posts [littlegreenfootballs.com] dealing with the issue on LGF. (Note: I'm posting this link in the context of referrer spamming -- no political statement is intended, and no political arguing over it is desired.)
Did you google before posting this? (Score:3, Informative)
I hope I'm not being too rude, but seriously, I googled for referrer spam [google.com] and bam...first result had some decent advice [spywareinfo.com]. This was just the first thing that came up. Add the word "apache" to your query and you will get some very helpful results [google.com]. Besides, this is Slashdot...not a trove of reliable information/advice. Just start using Apache to start blocking the Mallorys. Also, if you're still posting any kind of statistics or referrers publicly, stop. Spammers wouldn't do this if Bloggers didn't publish that kind of abusable data.
Re:Did you google before posting this? (Score:5, Informative)
Re:Did you google before posting this? (Score:5, Insightful)
Googling info isn't always the best, frequently people contribute things to this blog that you cannot duplicate by a simple query on google.
And last but not least you can always turn ask slashdot off in your preferences....
So for the last fucking time: YES HE CAN GOOGLE IT BUT SHE DECIDED TO ASK SLASHDOT INSTEAD. Move on.
Re:Did you google before posting this? (Score:2)
Hey, be nice. Was I really impolite (kinda like you're being right now)? Did I, or did I not provide helpful information to the poster?
Lighten up, Francis.
Re:Did you google before posting this? (Score:4, Insightful)
I see Ask Slashdot not as a substitute for a simple keyword search but rather a supplemental verification process. I have found that keyword searches don't necessarily reveal best practices; you get unedited, unrefuted claims that you have to sift through. In a reasonably informed techie discussion forum like Slashdot (sometimes), you can get some interesting debate and comparisons on various approaches and methodologies.
And, as you noted, it's a way to be exposed to problems which I don't currently have but might someday. Then when I encounter the problem, I hope a little fragment of memory in my aging brain will bubble to the surface to remind me that it's been discussed on Slashdot.
For researching technical problems, the best thing is to combine Google, Slashdot, Usenet newsgroups, and specialty forums such as (in the O.P.'s case) webhostingtalk.com, spend a little time in each place and take notes. From amongst voluminous chaff generally there's a bit of wheat to be harvested.
At the risk of belaboring the obvious, it should also be noted that the way to put useful information out there in the first place so that googlers can find it is precisely this sort of forum. Google is only your friend if there's something out there worth searching for.
Re:Did you google before posting this? (Score:2)
Re:Did you google before posting this? (Score:2)
What I have seen here is a better compilation of information than I have seen yet. So I thank the person for asking.
Re:Did you google before posting this? (Score:4, Informative)
They don't bother checking to see if your site publishes their referrers publically. I don't and I have it anyhow, of course. Also note my site uses a fairly obscure weblogging platform (PyDS), and that I've also customized the templates until there's no recoginizable signiture of any platform on my site, and I was still getting hammered.
I've gone with an
Don't forget to update the first RewriteCond line to match your server name.
Unfortunately, this has known false positives [jerf.org], but nothing too bad for me yet. But this approach won't scale; we'll either need something more sophisticated, or to make it less useful for referrer spammers until they stop doing it. (The recent "nofollow" tag is a good start, since it's Yet Another way to try to steal Google Juice.)
Re:Did you google before posting this? (Score:2)
Thanks. I figured there was something easy to do but I didn't care to dig too deeply
PHP bayesian filter. (Score:3, Informative)
The best way to check if it's spam would be with a bayesian filter [phpgeek.com].
Sure , it will take some coding / training the filter but this seems to me like the best option.
Re:PHP bayesian filter. (Score:2)
Bayesian filters are cool and all, but they aren't magic. If you don't understand them, then when you're wondering "why hasn't somebody tried using a Bayesian filter for this problem?", the answer is probably "because it isn't an appropriate solution". After they got popular for spam, there was a mini-ren
Re:PHP bayesian filter. (Score:1)
Personally I think it's as good as any solution because it will be smarter and more adaptive than most word-filter ideas mentioned in this thread.
Furthermore, you raised a valid point. A url is quite limited to filter. But maybe the script could get the referring page
Re:PHP bayesian filter. (Score:2)
Remember, in general, against an intelligent human attacker, only intelligent human vigilence can win. You can "what if, what i
You should.... (Score:3, Funny)
Take off and nuke 'em from orbit.
Just to be sure.
Re:You should.... (Score:2)
I have the web analysis program (Analog) generate privately with the referrers, but anything I put out does NOT show that. For those interested, I have a a page about referrer log spamming. [komar.org]
Re:You should.... (Score:2)
Perhaps we should launch all these questionable people into orbit and crash them into the nearesy star?
Deny them access in the first place (Score:5, Informative)
Re:Deny them access in the first place (Score:2)
Re:Deny them access in the first place (Score:2)
Solution for blog spam (Score:2)
If Spam Karma finds questionable words in comments -- it's configurable, and it comes with a good default list -- it sends users to a captcha. If they fail at the captcha -- and they're not on a strongbad keyword list like "viagra" and "vegas poker" -- the comments are sent for moderation.
Works great for me. Nope, the URL in my profile is not my blog anymore, it's on my own server, it's in portuguese and I ain't gonna expose my serve
Password (Score:2, Interesting)
Re:Password (Score:1)
Re:Password (Score:2)
1. Set up your robots.txt to disallow random directory.
2. Put an index file in there that will add any ip that visits the page to your firewall blocklist.
What happens is that ALL good spiders obey the robots.txt and the bad ones u
Re:Password (Score:1)
If you use AWSTATS (Score:2, Informative)
make it not worth their time (Score:2, Informative)
per googleblog:
Q: How does a link change?
A: Any link that a user can create on your site automatically gets a new "nofollow" attribute. So if a blog spammer previously added a comment like
Visit my <a href="http://www.example.com/">discount pharmaceuticals</a> site.
That comment would be transformed to
Visit my <a href="http://www.example.com/" rel="nofollow">discount pharmaceuticals</a> site.
--
just add this f
It's nice to be little (Score:2)
A) The code for it is homemade, would be a pain in the butt to re-tool a bot for little old me vs. all the livejournal, blogger, etc. sites out there...
B) I'm so insignificant out there with such low traffic the spammers probably wouldn't care anyways
C) If the spammers do start caring, I can code my blog around them to defeat them. So far it hasn't created a problem, but the stronger the problem the stronger my response will be...
don't let google see your referrer pages? (Score:3, Informative)
Whois? (Score:1)
1> The headers their clients send are different than those of ordinary clients.
2> That the properties revealed by whois are different for refer spammer clients than for ordinary clients.
3> That the whois properties for the spam refer sites are different than those of legitimate sites.
I'll bet that ignoring input f
Protect your stats (Score:1, Informative)
$0.02,
_Michael.
Re:Protect your stats (Score:2)
It's the comment/trackback spam that bugs me, and like another poster said, Spam Karma (on Wordpress, anyway) seems to be working wonders. (This is after trying built-in moderation, three strikes, stopgap, and several other methods)
Captcha, captcha, captcha. (Score:2)
Captcha access to the referral log.
Nobody's said this yet?? (Score:1)
The stats pages no longer show up on any search engines, so a) The spammers get no 'pagerank' from those links (which is what they do it for) and b) they can't find the stats pages.
I was getting shitloads of referer spam; within a week (as soon as google updated) it dropped to nothing. I've had no referer spam AT ALL since then.
Perhaps they'll start just crawling the entire web, but it appears that at the moment they do a google search to find pages that post their refere
webalizer referrer work-a-round patch (Score:4, Informative)
Our initial attempt to solve this was to complain to the ISP of the referrer spammers. That did no good. The ISP was willing to listen, but not to act.
We did manage to actually track down the jerks who were doing the referrer spam. They told us that they were attempting to create links back to their sites for better search engine placement.
Our work-a-round was two fold. For various reasons we wanted to keep these our webalizer [mrunix.net] stats externally accessible. So we requested bots (the ones that follow the rules at least) to not index our external stats and we modified webalizer to not form links back to the referrers.
We edited our robots.txt file to exclude legit bots from our stats:
We also patched webalizer v2.01-10 [isthe.com] to no longer form URLs to referrers. Now only a plain text line without the leading http:// shows up in the table. The original referrer spammers gave up when they lost off the the links back to their sites.
The bottom of the 0.basic.patch prevents webalizer [mrunix.net] from forming links back to referrers. See README-FIRST [isthe.com] for details on this patch set.
Put "rel=nofollow" in the referrer links (Score:3, Informative)
But if you have to, then put "rel=nofollow" in the link itself. This makes Google (and other search engines) discard the link when calculating search rankings.
Go here [google.com] for more info.
Use Google's own trick (Score:1)
mod_security (Score:3, Informative)
Now when the referer spammer hits my site, they get denied and get nothing back. Bandwidth wasted serving up pages to referer spammers is cut to virtually nil. The spammers are still there banging away and a few still get by though. The list of referrers needs to be monitored so that new mod_security rules can be added as required. That's no different than using mod_rewrite to deny the referrer spammers though.
another solution (Score:1)
Porn: anybody that wants good porn knows to look at p2p solutions (just look in the right spots, it's all there for free)
viagra, etc: if you don't know that it doesn't work, you're an idiot
free stuff: nothing in life is free
special service: there are always string's attached
correct your account information: if you get your identity "stolen"