Slashdot Log In
Google Cans Comment Spam
Posted by
timothy
on Wed Jan 19, 2005 07:49 AM
from the whomp-whomp-whomp dept.
from the whomp-whomp-whomp dept.
fthiess writes "Comment spam is in many ways even more annoying than regular email spam, since you generally have to do more than just hit the delete button to get rid of it. Its defining characteristic is that spammers abuse websites where the public can add content (blogs, wikis, forums, and even top referrer lists) to increase their own ranking in search engines. It seems, however, that the days of content spam are numbered: today Google announced that, in partnership with MSN Search and Yahoo!, that they have implemented a way to block content spam." (More below.)
"Briefly, you just change your blogging/wiki/forum/etc. software so that any hyperlinks in publicly-contributed text have a new rel=nofollow attribute added to any anchor tags. Google, MSN, and Yahoo! will now no longer index any such links, so the motive for content spamming disappears. Especially hopeful is the fact that a slew of makers of blogging software, including Six Apart, have announced they are supporting the new attribute."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Cooperation is a good thing (Score:5, Insightful)
and of course... (Score:5, Insightful)
Simon.
It's one way... (Score:5, Interesting)
Re:It's one way... (Score:5, Informative)
It is not a solution meant to change the content on a website (that would be tantamount to censorship). It only changes how the search engines handle the links (note: the supporters/developers of such a standard are search engine companies).
The best question raised in this post [slashdot.org] is if such a tag is standards acceptable.
Parent
Re:It's one way... (Score:5, Funny)
How's that old saying go? Your right to free speech ends at my rights to kick you in the nuts when you spam me.
Something like that, anyway...
Parent
Re:It's one way... (Score:5, Insightful)
The comment spam is mostly used to get a better searchengine ranking. A blog which uses this attibute on link tags is far less interesting to comment spammers, so chances are the moderaters have to delete less spam.
Parent
Re:It's one way... (Score:5, Insightful)
But somehow I don't think spammers really care if a blog uses this system or not. It's probably easier to just spam all blogs than to figure out which are useless. Just like email spammers don't care much if an address is valid or not.
Some people think that adding spam filters to an email account reduces the spam sent, while it only reduces the amount of spam received. This solution does neither.
However, all efforts to fight spam should be welcomed and supported. Despite my pessimism, it will be interesting to see how it turns out.
Parent
Re:No, no, they should BLOCK all BLOGS by default. (Score:3, Informative)
Re:It's one way... (Score:3, Insightful)
True, it's a long term solution which is not gonna do any good in the short run. The short term solution is to make it impossible for the spammers to attack your blog in the first place. Change the names of the files that handle comment posting etc... (and of course change the code that points to such pages) and most automated spam bots are lost. If you really want to be secure, implement an intermediate page where it asks explicit permission before posting (tick a checkbox and click "yes, submit") and you
Use CSS (Score:4, Informative)
It is pretty easy to make rel="nofollow" visible to normal users too in modern web browsers using CSS. You could use something like this:
That will display the given image before any links marked as nofollow.
Parent
Re:Use CSS (Score:3)
Re:It's one way... (Score:3, Informative)
They will if they put in their client-side stylesheet, or the blog owner puts it in the site's stylesheet. I do similar things to put "[PDF]" after PDF links and "[reg]" after nytimes.com links.
Re:It's one way... (Score:3, Informative)
if you're generally curious for what/if people use the rel attribute on anchor tags.
There's lots of power you can exert over the appearance of web pages through your client-side stylesheet.
If only there were a way to restrict a set a rules to particular sites, or that you could trust sites to put ID attributes on their BODY tags to uniquely identify their pages to the world, even just the domain name (substituting some other character for the dots)
Is the result valid HTML/XHTML? (Score:5, Interesting)
Re:Is the result valid HTML/XHTML? (Score:3, Funny)
IE have proposed introducting simlar measures to IE6 using ActiveX and DHTML.
Re:Is the result valid HTML/XHTML? (Score:5, Informative)
Yes and yes.
From the W3C:Links in HTML documents - The A element [w3.org]: Basic HTML data types - Link types [w3.org]:
Parent
Re:Is the result valid HTML/XHTML? (Score:5, Insightful)
I think this last paragraph is important. "nofollow" is not on the official list of link-types [w3.org]. If blog authors wish to use this attribute in anchor elements, they need to define it properly (or at least properly reference a definition).
Remember back in the 90's when Netscape and MS were breaking standards right and left so that their browsers would have an edge on the competition? That was the wrong way to do it, and it created the mess we're in now with sloppy HTML spewed all over the web and designers unable to use compliant designs because the most popular browser doesn't even try to support standards (an example here [meyerweb.com]). Google is doing this the right way. They went back and read the HTML specification to see if it was already capable of doing what they needed. It does? Great! Let's utilize the standard!
Granted, HTML these days has a much better design than it did in the pre-4.0 specifications. Back when Netscape and MS were at each other's throats the document format was actually incapable of doing a lot of things that designers wanted to do on the web. But HTML is a very mature format these days.
Parent
Re:Is the result valid HTML/XHTML? (Score:3, Informative)
And the answer, of course, is yes. "rel" attribute [w3.org], valid for "a" and "link" element types. Take a look at the source of any Wordpress weblog and you'll see it being used for many things already.
The caveat is that you should define a profile about the valid keywords you'll be using in "rel"; I don't know if Google is using a profile, but it's not mandatory.
Re:Is the result valid HTML/XHTML? (Score:3, Interesting)
"rel" is short for "relationship" - it can contain values like "previous", "next", "contents", "index", etc.
Now if only... (Score:4, Interesting)
Re:Now if only... (Score:5, Interesting)
They could even selectively add or omit it based on the comment's moderation. Include the nofollow tag by default, but if a comment with a link in it is moderated highly, remove the tag so search engines can use it. Sounds like the best of both worlds..
Parent
Could work the other way too ;) (Score:2)
Soon you'll see that only links with tag rel="nofollow" will count as geniune links because the spammers do NOT use that as much as the regular users..
Just like spammers were the first people to implement DomainKeys
Re:Could work the other way too ;) (Score:5, Insightful)
RTFA. Slashdot could modify slashcode to automatically add the attribute to all links posted in comments. Comment spammers can't do anything about it, so they'll move away to other sites.
No normal links (i.e. not in visitor contributed content) should have the attribute. So slashdot will still be full of normal links; only the links in the comments will have the attribute.
Parent
Miserable Failure? (Score:3, Funny)
Band aid (Score:4, Interesting)
I'm not really into blogging so I don't know how big of a problem this is. I get some spam in my guestbook, which I promptly remove. The spam iteself is what's really irritaing, not the potential "elevating" of the spamvertised site in search-engines, where I've never personally run across one that I can remember.
Am I correct in assuming that these sites pops up and down relatively often? Maybe it'd be possible to use temporal component to the rating. Say if the link points to a site which was just registered two days ago, it's given a very very low weight, and then you ramp up as time goes by. As spam gets deleted from blogs and guestbooks, time would work against these spammers. Or? I dunno.
New denial of service attack... (Score:3, Insightful)
Re:New denial of service attack... (Score:3, Insightful)
So again I ask, how?
It's a pagerank question, not indexing (Score:3, Interesting)
Not quite. What happens is, that the link wont add anything to the site in question. As you probably all know, most search engines rank pages by incoming links - it's not just google. By adding this tag, the incoming link wont count.
I think this is a great idea. It will probably break the w3c compliance, but hey - anything to piss off a spammer.
Useful links (Score:5, Insightful)
Re:Useful links (Score:5, Informative)
As for useful links in comments; if they're really good sites, people are bound to blog about them more generally. And my poor blog gets few enough hits that it will be no problem for me to manually edit genuine comments to remove nofollow tags.
Parent
Opportunity for Firefox (plugin) (Score:4, Interesting)
Actually, are there any plugins already in existence that modify the appearance of a link based on a regexp match?
Re:Opportunity for Firefox (plugin) (Score:5, Informative)
Let me introduce you to the wondeful world of userContent.css [mozilla.org].
Something like this should work:
Parent
Re:Opportunity for Firefox (plugin) (Score:4, Interesting)
Parent
But what if I need to search (Score:5, Funny)
Wikipedia (Score:5, Informative)
Just Remove The Sites (Score:5, Insightful)
Why stop the indexing of relative links from blogs to make google's life easier?
99% of the links posted in comments are relavent and would be beneficial to index. Why stop this for the 1% of jackasses out there?
The domains contained in the links from blogspam are well known, and there are plenty of blacklists out there. Why doesn't googleyahoomsn just remove these sites from its database? Its such an easy solution. I believe they already do this in some circumstances for link trading systems whose only goal is to get higher pagerank.
This solves only 1/2 of the problem (Score:4, Informative)
While this will prevent spammers from bumping up their sites' Page Rank (probably their primary motivation for comment spam anyway), it doesn't prevent their bots from spamming targeted blogs etc. in the first place. That is still best handled by the blog software providers.
For example, WordPress has a variety of different plugins [wordpress.org] for handling comment spam. The best one I've seen renders a series of characters graphically (a la TicketBastard) which the user (a human, of course) has to type into a text field on the comment form before their comment is accepted. Blogs implementing this type of mechanism typically have spam coming from bots drop down to zero.
Re:This solves only 1/2 of the problem (Score:4, Insightful)
Sure, that's great for humans using a graphical browser, with images turned on, and 20/20 vision. But that doesn't cover all internet users. What about text browsers? What about screen readers?
This is the age of internet accessibility folks, and it's exactly why I refuse to use Captcha [captcha.net] tests on my own blog [h4xx0r.co.uk] - instead, I currently filter all comments and trackbacks through wp-spamassassin [ioerror.us]. Haven't had a single problem yet, although it's early days.
The rel="nofollow" trick sounds promising for killing off the PageRank cheats, but it won't stop humans clicking the links...
Parent
This is open to severe abuse (Score:5, Insightful)
Well, luckily Google has now released a way for people to link to each other without leaking PageRank. Yes, the nofollow relation. So, now everyone can link to each other, and no-one gets any benefit out of it whatsoever.
This tag is not a bad idea, but I think the good things it could stamp out weren't considered anywhere near as much as the few bad things it can stamp out..
Much better solution - spamassassin (Score:3, Interesting)
This can be done whether it is linked in a blog or not, and will improve the overall quality of the search database.
- Erwin
!!! MAKE $$$ FA$T !!!! (Score:3, Funny)
Hey! This is the first time that I can comment spam and have it not modded off topic!
That Won't Help for the Biggest Thorns in My Side (Score:4, Insightful)
Re:A gift to Microsoft (Score:4, Interesting)
Parent
Re:A gift to Microsoft (Score:5, Funny)
Parent
Re:A gift to Microsoft (Score:4, Insightful)
Parent
Re:A gift to Microsoft (Score:4, Funny)
Parent
Re:A gift to Microsoft (Score:5, Insightful)
Sysadmin Geeks who have to clean up the messes left by shoddy Microsoft products, day after day, hate their products because they make extra work for us. We hate Outlook, IE, and IIS because their penchant for spreading worms and viruses. We hate service packs which break more than they fix. We hate Frontpage because of the non-standard, blecherous, broken HTML it spews forth. We hate the general lackadasical attitude Microsoft has about security and quality in general.
Libertarian-minded geeks hate Microsoft for their flagrant disregard for the law and the courts. We hate them for the way they blatantly infringe on other company's patents and lawyer their way out of it. We hate the way they bankrupt or buy out anyone making a product which actually competes with them. We hate the way they use puppet companies (SCO, BSA) as hired thugs to bully other companies on their behalf.
Anti-corporate geeks hate Microsoft because it's a prime example of corporate greed run amok and of the dangers of unfettered capitalism.
Parent
Re:But this doesn't actually can spam (Score:3, Insightful)
people spam comment boards on sites with high pageranks.
Goolge's logic here is: If a high-ranked site links to site X, X's ranking also gets higher. If your site is spam/ad-ridden, this is step 3. Profit!
With rel=nofollow in place, this tactic no longer works.
No Revenues -> No reason to spam
QED
Re:Only one problem. (Score:3, Interesting)
So if the big blogs use the attribute then spammers will go after the slow to upgrade folks, in self defense most of them will upgrade eventually.
Really even for a custom designed visitor book or blo
Re:This could be abused (Score:3, Insightful)
Your abuse scheme seems a bit convoluted to me, or do I miss something ?