Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Spam Businesses Google The Internet

Google Cans Comment Spam 434

fthiess writes "Comment spam is in many ways even more annoying than regular email spam, since you generally have to do more than just hit the delete button to get rid of it. Its defining characteristic is that spammers abuse websites where the public can add content (blogs, wikis, forums, and even top referrer lists) to increase their own ranking in search engines. It seems, however, that the days of content spam are numbered: today Google announced that, in partnership with MSN Search and Yahoo!, that they have implemented a way to block content spam." (More below.)

"Briefly, you just change your blogging/wiki/forum/etc. software so that any hyperlinks in publicly-contributed text have a new rel=nofollow attribute added to any anchor tags. Google, MSN, and Yahoo! will now no longer index any such links, so the motive for content spamming disappears. Especially hopeful is the fact that a slew of makers of blogging software, including Six Apart, have announced they are supporting the new attribute."

This discussion has been archived. No new comments can be posted.

Google Cans Comment Spam

Comments Filter:
  • by It doesn't come easy ( 695416 ) * on Wednesday January 19, 2005 @07:51AM (#11406842) Journal
    It's nice to see Google, MSN, and Yahoo cooperating on this effort.
  • and of course... (Score:5, Insightful)

    by Ckwop ( 707653 ) * on Wednesday January 19, 2005 @07:51AM (#11406846) Homepage
    Don't forget to put that attribute in your track-back links either :)

    Simon.
  • It's one way... (Score:5, Interesting)

    by freitasm ( 444970 ) on Wednesday January 19, 2005 @07:53AM (#11406851) Homepage
    It certainly will help filtering some of the spam sites out of Google rank and so on, but the links will still be there in blog comments, bulletin boards, etc. The Googlebot will not follow the links, but human readers won't see the NOFOLLOW tag - and they'll click. It means that moderators still have manual work to do.
    • Re:It's one way... (Score:5, Informative)

      by the_pooh_experience ( 596177 ) on Wednesday January 19, 2005 @08:01AM (#11406892)

      It is not a solution meant to change the content on a website (that would be tantamount to censorship). It only changes how the search engines handle the links (note: the supporters/developers of such a standard are search engine companies).

      The best question raised in this post [slashdot.org] is if such a tag is standards acceptable.

    • Re:It's one way... (Score:5, Insightful)

      by Erik Hensema ( 12898 ) on Wednesday January 19, 2005 @08:02AM (#11406905) Homepage

      The comment spam is mostly used to get a better searchengine ranking. A blog which uses this attibute on link tags is far less interesting to comment spammers, so chances are the moderaters have to delete less spam.

      • by croddy ( 659025 )
        A blog which uses this attibute on link tags is far less interesting to

        blog? less interesting? how could a blog be any less interesting?

      • Re:It's one way... (Score:5, Insightful)

        by CortoMaltese ( 828267 ) on Wednesday January 19, 2005 @08:49AM (#11407244)
        Eventually, this might reduce the amount of comment spam.

        But somehow I don't think spammers really care if a blog uses this system or not. It's probably easier to just spam all blogs than to figure out which are useless. Just like email spammers don't care much if an address is valid or not.

        Some people think that adding spam filters to an email account reduces the spam sent, while it only reduces the amount of spam received. This solution does neither.

        However, all efforts to fight spam should be welcomed and supported. Despite my pessimism, it will be interesting to see how it turns out.

    • Sounds to me like an ideal candidate for a Firefox extension (if it doesn't exist already -couldn't be bothered to look), that indicates links with a NOFOLLOW tag differently (e.g., different colour, different cursor).
      • Interesting idea, but I doubt that such an extension would be useful. The nofollow tag will be added automatically by the blog system to ALL links submited by visitors. It can't help to distinguish spam messages. Nofollow doesn't mean "this is spam" but "this is a user-added link, so it might be spam". It is useless information for humans who can judge the link by its context.

        Of course, one could try creating such an extension to see if it works. That's the power of open source!
    • True, it's a long term solution which is not gonna do any good in the short run. The short term solution is to make it impossible for the spammers to attack your blog in the first place. Change the names of the files that handle comment posting etc... (and of course change the code that points to such pages) and most automated spam bots are lost. If you really want to be secure, implement an intermediate page where it asks explicit permission before posting (tick a checkbox and click "yes, submit") and you

    • Use CSS (Score:4, Informative)

      by JamesHenstridge ( 14875 ) <james@jamesh.id.3.14159au minus pi> on Wednesday January 19, 2005 @09:55AM (#11407867) Homepage

      It is pretty easy to make rel="nofollow" visible to normal users too in modern web browsers using CSS. You could use something like this:

      a[rel="nofollow"]:before {
      content: url(an-image-representing-nofollow-links.png);
      }

      That will display the given image before any links marked as nofollow.

    • Re:It's one way... (Score:3, Informative)

      by HTH NE1 ( 675604 )
      but human readers won't see the NOFOLLOW tag - and they'll click.

      They will if they put
      a[rel=nofollow]:after { content: " [NOFOLLOW]"; }
      in their client-side stylesheet, or the blog owner puts it in the site's stylesheet. I do similar things to put "[PDF]" after PDF links and "[reg]" after nytimes.com links.
      • Re:It's one way... (Score:3, Informative)

        by HTH NE1 ( 675604 )
        Or even

        a[rel]:after { content:" [rel=" attr(rel) "]" }

        if you're generally curious for what/if people use the rel attribute on anchor tags.

        There's lots of power you can exert over the appearance of web pages through your client-side stylesheet.

        If only there were a way to restrict a set a rules to particular sites, or that you could trust sites to put ID attributes on their BODY tags to uniquely identify their pages to the world, even just the domain name (substituting some other character for the dots)

  • by maxwell demon ( 590494 ) on Wednesday January 19, 2005 @07:54AM (#11406859) Journal
    Does HTML/XHTML allow "rel" attributes on links? And if so, is "nofollow" an allowed value for that tag?
    • In related News:
      IE have proposed introducting simlar measures to IE6 using ActiveX and DHTML.

    • by Anonymous Coward
      Yes, the attribute rel [w3.org] is allowed in anchor elements in HTML. The value "nofollow" is not on the list of recognized types [w3.org], but that's not so important since the value of rel can be anything.

      It's an interesting idea, but it's probably a matter of short order before MS starts to use this to cut out non-MS sites.

    • by darkpurpleblob ( 180550 ) on Wednesday January 19, 2005 @08:16AM (#11406987)

      Yes and yes.

      From the W3C:

      Links in HTML documents - The A element [w3.org]:
      rel = link-types [CI]
      This attribute describes the relationship from the current document to the anchor specified by the href attribute. The value of this attribute is a space-separated list of link types.
      Basic HTML data types - Link types [w3.org]:
      Authors may use the following recognized link types, listed here with their conventional interpretations. In the DTD, %LinkTypes refers to a space-separated list of link types. White space characters are not permitted within link types.


      These link types are case-insensitive, i.e., "Alternate" has the same meaning as "alternate".

      User agents, search engines, etc. may interpret these link types in a variety of ways. For example, user agents may provide access to linked documents through a navigation bar.

      ...

      Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types. Please see the profile attribute of the HEAD element for more details.
      • by Milo Fungus ( 232863 ) on Wednesday January 19, 2005 @09:00AM (#11407321)

        Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types. Please see the profile [w3.org] attribute of the HEAD element for more details.

        I think this last paragraph is important. "nofollow" is not on the official list of link-types [w3.org]. If blog authors wish to use this attribute in anchor elements, they need to define it properly (or at least properly reference a definition).

        Remember back in the 90's when Netscape and MS were breaking standards right and left so that their browsers would have an edge on the competition? That was the wrong way to do it, and it created the mess we're in now with sloppy HTML spewed all over the web and designers unable to use compliant designs because the most popular browser doesn't even try to support standards (an example here [meyerweb.com]). Google is doing this the right way. They went back and read the HTML specification to see if it was already capable of doing what they needed. It does? Great! Let's utilize the standard!

        Granted, HTML these days has a much better design than it did in the pre-4.0 specifications. Back when Netscape and MS were at each other's throats the document format was actually incapable of doing a lot of things that designers wanted to do on the web. But HTML is a very mature format these days.

    • The list of valid link types [w3.org] does not include "Nofollow", but "Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile [w3.org] to cite the conventions used to define the link types."
    • Too lazy to search, huh? Ok, I'll give up moderation and search for you :)

      And the answer, of course, is yes. "rel" attribute [w3.org], valid for "a" and "link" element types. Take a look at the source of any Wordpress weblog and you'll see it being used for many things already.

      The caveat is that you should define a profile about the valid keywords you'll be using in "rel"; I don't know if Google is using a profile, but it's not mandatory.
    • Yes, this appears to be valid. I can't find the part of the actual spec, but w3schools' XHTML reference [w3schools.com] lists it as an acceptible attribute to <a>.

      "rel" is short for "relationship" - it can contain values like "previous", "next", "contents", "index", etc.

  • Now if only... (Score:4, Interesting)

    by deltwalrus ( 234362 ) on Wednesday January 19, 2005 @07:54AM (#11406863) Homepage
    Slashdot could implement something like this, it would make article comments meaningful again.
    • Re:Now if only... (Score:5, Interesting)

      by PetiePooo ( 606423 ) on Wednesday January 19, 2005 @08:01AM (#11406896)
      Slashdot could implement something like this, it would make article comments meaningful again.

      They could even selectively add or omit it based on the comment's moderation. Include the nofollow tag by default, but if a comment with a link in it is moderated highly, remove the tag so search engines can use it. Sounds like the best of both worlds..
    • People don't spam Slashdot to get pagerank though. Since there are so many links on the pages already, any possible pagerank gained would be minute.
  • When Guns are outlawed, only outlaws will have guns ...

    Soon you'll see that only links with tag rel="nofollow" will count as geniune links because the spammers do NOT use that as much as the regular users..

    Just like spammers were the first people to implement DomainKeys :)
    • by Erik Hensema ( 12898 ) on Wednesday January 19, 2005 @08:05AM (#11406926) Homepage

      RTFA. Slashdot could modify slashcode to automatically add the attribute to all links posted in comments. Comment spammers can't do anything about it, so they'll move away to other sites.

      No normal links (i.e. not in visitor contributed content) should have the attribute. So slashdot will still be full of normal links; only the links in the comments will have the attribute.

  • by epsalon ( 518482 ) * <slash@alon.wox.org> on Wednesday January 19, 2005 @07:58AM (#11406880) Homepage Journal
    But what will happen then with the miserable failure [google.com] and weapons of mass destruction [blueyonder.co.uk]? Can't anyone efficiently bomb google anymore?
  • Band aid (Score:4, Interesting)

    by eddy ( 18759 ) on Wednesday January 19, 2005 @07:58AM (#11406882) Homepage Journal

    I'm not really into blogging so I don't know how big of a problem this is. I get some spam in my guestbook, which I promptly remove. The spam iteself is what's really irritaing, not the potential "elevating" of the spamvertised site in search-engines, where I've never personally run across one that I can remember.

    Am I correct in assuming that these sites pops up and down relatively often? Maybe it'd be possible to use temporal component to the rating. Say if the link points to a site which was just registered two days ago, it's given a very very low weight, and then you ramp up as time goes by. As spam gets deleted from blogs and guestbooks, time would work against these spammers. Or? I dunno.

    • Re:Band aid (Score:2, Informative)

      The point is that the motivation to spam blogs rests on the assumption that posting links to one's site on blogs elevates the Google (Yahoo, MSN Search) rank for the sites linked to. Once that assumption is invalidated, the incentive to spam goes away. It should actually help quite a bit.
      • Re:Band aid (Score:2, Interesting)

        by eddy ( 18759 )

        the incentive to spam goes away.

        For a sane rational person maybe, but that's not how spammers work.

        A quick example: For the last four years I've been getting spam to the account "stef" at my domain.

        There is no account "stef" on my domain, and there haven't been for at least four years (previous owner I guess). So mail is rejected at RCPT TO.

        ... still they keep coming, year after year after year. Tell me that's rational.

        Not every opportunity will disappear with this new link-attribute, and so the

        • Re:Band aid (Score:2, Informative)

          by Threni ( 635302 )
          > still they keep coming, year after year after year. Tell me that's rational.

          It's rational because removing the account from their spam lists won't make them any less profitable, but removing even 1 live address might, so all things being equal (no limit on the number of addresses being spammed, not much cost difference in a list of email addresses which is this rather than that big) you might as well leave them in.
  • by It doesn't come easy ( 695416 ) * on Wednesday January 19, 2005 @08:00AM (#11406887) Journal
    Hmmm...if a malicious program adds the tag to links served by a compromised html server, you could have an interesting and different sort of denial of service attack, although it would be slow to take effect.
    • How? Google simply ignores PageRank for the nofollow rel attribute. It doesn't vote against the linked site or remove it from the index. Hell, Google still follows the link, it just doesn't count the linking site when calculating PageRank.

      So again I ask, how?
  • by Underholdning ( 758194 ) on Wednesday January 19, 2005 @08:00AM (#11406889) Homepage Journal
    Google, MSN, and Yahoo! will now no longer index any such links
    Not quite. What happens is, that the link wont add anything to the site in question. As you probably all know, most search engines rank pages by incoming links - it's not just google. By adding this tag, the incoming link wont count.
    I think this is a great idea. It will probably break the w3c compliance, but hey - anything to piss off a spammer.
  • Only one problem. (Score:2, Insightful)

    by jellomizer ( 103300 ) *
    There are to many custom BLOG software out there and many of these programmers don't read slashdot (or may not read it today) or check with Google Yahoo or MSN, are concerned with there blogging software messing with page ranks. There are also way to many people who will not upgrade there BLOG software because it is not worth the hassle. There are still people who run Windows 3.1 or Apple ][ or Commodore 64 expecting people to upgrade there software is not going to happen any time soon. Mabey most will be
    • Re:Only one problem. (Score:3, Interesting)

      by miu ( 626917 )
      There is something of an incentive here though, spammers seeking to elevate their pagerank will focus on sites that give them the best return. When large ISPs shut down open relays the spammers went after small ISPs, because of the increased attention from spammers the small ISPs shut down their open relays.

      So if the big blogs use the attribute then spammers will go after the slow to upgrade folks, in self defense most of them will upgrade eventually.

      Really even for a custom designed visitor book or blo

  • Useful links (Score:5, Insightful)

    by Simon (S2) ( 600188 ) on Wednesday January 19, 2005 @08:00AM (#11406891) Homepage
    Forums and Blogs often contain very useful links. What about them? What about all those sites that are *only* linked to from blogs and forums, and that actually are great and useful sites?
    • Good point. Same goes for wikis too, of course (unless there's a way to just block the page histories-- and no, I haven't RTFA yet to see if there is one).
    • Re:Useful links (Score:5, Informative)

      by BohemianCoast ( 632363 ) on Wednesday January 19, 2005 @08:24AM (#11407057) Homepage
      Links in the main body of the blog post will be fine. Blogs of course, have high page rank because bloggers comment on each other's blogs. This tag may have a side effect of generally reducing the page rank of blogs.

      As for useful links in comments; if they're really good sites, people are bound to blog about them more generally. And my poor blog gets few enough hits that it will be no problem for me to manually edit genuine comments to remove nofollow tags.
      • As for useful links in comments; if they're really good sites, people are bound to blog about them more generally.

        Yes, that's fine, but if I search something I do not go through a bunch of blogs, I use google, and that will not work any more.

        And my poor blog gets few enough hits that it will be no problem for me to manually edit genuine comments to remove nofollow tags.

        See, you still have to edit posts manually. Isn't it better then if you remove the spam manually?
    • I haven't RFTA, but I would imagine this is added to areas where anybody can add a comment. This is not the same as all blog content, so a link pointing that a site author specifies would not have this attribute.
      • I haven't RFTA, but I would imagine this is added to areas where anybody can add a comment. This is not the same as all blog content, so a link pointing that a site author specifies would not have this attribute.

        I have RTFA, and it's exactly how you describe it. But what I am talking about are the useful links in the comments. There are different types of blogs and forums: those that are abused often, and those that have a lot of useful links and information.
        With the tag applied to all links in the pos

        • With the tag applied to all links in the posts, useful sites will not get a good search ranking even if they may deserve it.

          You can get around that [incutio.com] by removing the rel attribute for people who authenticate against their provided email address.

          PS: It's not a tag, it's an attribute value.

          • In the slashdot/code case, you'd remove it for people with good karma. Spammers would quickly be denied the ability to influence rankings.
  • by Schweg ( 730121 ) on Wednesday January 19, 2005 @08:01AM (#11406897)
    Why not modify Firefox (or provide a plugin) that allows such links to be grayed out or otherwise marked specially?

    Actually, are there any plugins already in existence that modify the appearance of a link based on a regexp match?

  • Will this be implemented on Slashdot as well? Perhaps those with karma lower than neutral would get a rel="nofollow" tag added to the URLs they post?

  • by antifoidulus ( 807088 ) on Wednesday January 19, 2005 @08:03AM (#11406910) Homepage Journal
    for ch3aP Can.adi n v31g.r a?
  • So, one of the things that Google really has going for it is the fact that they assign "value" of a link based on how it is referenced. If we mute the voice of the average blogger in that calculation, don't we lose quite a bit? Granted, the cost is having the first few links owned by content spammers, but that seems like a small price to pay, and there should be other, less absolute ways of dealing with it....
    • As others have said, the blog itself should be fine since its only the comments that should have the rel=nofollow on them. I also imagine that this is only really needed for *unmoderated* comment, where moderated content is not likely to contain spam. Can anyone confirm if Six Apart provides for this distinction?
    • Blogs have been badly skewing search engines for a while now, so it is just brining back some balance. The blog will still count, but not the comments.

  • "rel" has a defined meaning, stating the nature of the relationship of that link.

    For example, <A HREF=... REL=next>
    Here, the linked to document is defined as being "next" in relation to this document.

    • True.

      In fact, I think "class" would be the correct attribute. It defines a type for an element, which can be used by CSS, but is not necessarily limited to that. Elements can also have more than one class value (e.g. class='nofollow external_link' etc)

      The fact that you don't necessarily use the nofollow class in your CSS is not a problem (and you can always style nofollow links differently should you wish...
  • Nice feature. I like it. The link wont add pagerank to the linkee, but will it also not drain the linker?
  • Wikipedia (Score:5, Informative)

    by wikinerd ( 809585 ) on Wednesday January 19, 2005 @08:13AM (#11406968) Journal
    Wikipedia already implemented this feature. See here [wikipedia.org].
  • by theNote ( 319197 ) on Wednesday January 19, 2005 @08:14AM (#11406976)
    This is not a solution as far as I'm concerned.

    Why stop the indexing of relative links from blogs to make google's life easier?
    99% of the links posted in comments are relavent and would be beneficial to index. Why stop this for the 1% of jackasses out there?

    The domains contained in the links from blogspam are well known, and there are plenty of blacklists out there. Why doesn't googleyahoomsn just remove these sites from its database? Its such an easy solution. I believe they already do this in some circumstances for link trading systems whose only goal is to get higher pagerank.

  • I am afraid that many people could use the nofollow tag in a commercial way. I have outlined my thought in my blog [wikinerds.org]. What do you think about this possibility?
  • by PeeAitchPee ( 712652 ) on Wednesday January 19, 2005 @08:20AM (#11407026)

    While this will prevent spammers from bumping up their sites' Page Rank (probably their primary motivation for comment spam anyway), it doesn't prevent their bots from spamming targeted blogs etc. in the first place. That is still best handled by the blog software providers.

    For example, WordPress has a variety of different plugins [wordpress.org] for handling comment spam. The best one I've seen renders a series of characters graphically (a la TicketBastard) which the user (a human, of course) has to type into a text field on the comment form before their comment is accepted. Blogs implementing this type of mechanism typically have spam coming from bots drop down to zero.

    • by CaptainBaz ( 621098 ) on Wednesday January 19, 2005 @08:37AM (#11407150) Homepage Journal
      The best one I've seen renders a series of characters graphically (a la TicketBastard) which the user (a human, of course) has to type into a text field on the comment form before their comment is accepted.

      Sure, that's great for humans using a graphical browser, with images turned on, and 20/20 vision. But that doesn't cover all internet users. What about text browsers? What about screen readers?

      This is the age of internet accessibility folks, and it's exactly why I refuse to use Captcha [captcha.net] tests on my own blog [h4xx0r.co.uk] - instead, I currently filter all comments and trackbacks through wp-spamassassin [ioerror.us]. Haven't had a single problem yet, although it's early days.

      The rel="nofollow" trick sounds promising for killing off the PageRank cheats, but it won't stop humans clicking the links...

  • by Peter Cooper ( 660482 ) on Wednesday January 19, 2005 @08:25AM (#11407063) Homepage Journal
    There are a lot of people out there who understand the PageRank system, and complain that if they add outgoing links on their site then their previous PageRank will be "leaked" to other sites, rather than their own internal pages.

    Well, luckily Google has now released a way for people to link to each other without leaking PageRank. Yes, the nofollow relation. So, now everyone can link to each other, and no-one gets any benefit out of it whatsoever.

    This tag is not a bad idea, but I think the good things it could stamp out weren't considered anywhere near as much as the few bad things it can stamp out..
  • XFN [gmpg.org] already uses the ref attribute to establish relationships to people you are linking to.
  • A more effective solution would be to remove the comment system from your blog. You won't get spam. You won't get argued with or corrected. You won't endure lesser intellects (cough) posting their inarticulate garbage (cough, cough) on your site. And you won't have those embarrassing "Comments [0]" links all over your home page anymore. Sorted.

    If anyone still wants to take issue with your sterling advice, let 'em put it in an email, where it can be deleted more easily.

    Ade_
    /
  • by EJB ( 9167 ) on Wednesday January 19, 2005 @08:52AM (#11407268) Homepage
    Why doesn't Google use spamassassin or another spam filtering tool? One based on Bayesian analysis - to determine if a page is just spam, and give it a lower score.
    This can be done whether it is linked in a blog or not, and will improve the overall quality of the search database.

    - Erwin

  • by iammrjvo ( 597745 ) on Wednesday January 19, 2005 @09:09AM (#11407402) Homepage Journal

    Hey! This is the first time that I can comment spam and have it not modded off topic!
  • by Greyfox ( 87712 ) on Wednesday January 19, 2005 @03:35PM (#11412157) Homepage Journal
    Whenever I'm searching for technical information, a couple of sites always come up that are useless to me. They have a question/answer format, questions are left in the clear for search engines, while the answers require registration. What I need is a way to filter those sites out from my searches, so that they simply don't show up in any result set. Hmm might be a good excuse to play with writing Firefox plug-ins... :-)

Genius is ten percent inspiration and fifty percent capital gains.

Working...