Forgot your password?
typodupeerror
The Internet Spam

Webmasters Pounce On Wiki Sandboxes 324

Posted by simoniker
from the fold-spindle-mutilate dept.
Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."
This discussion has been archived. No new comments can be posted.

Webmasters Pounce On Wiki Sandboxes

Comments Filter:
  • Why just wikis? (Score:5, Insightful)

    by GillBates0 (664202) on Monday June 07, 2004 @12:56PM (#9357476) Homepage Journal
    Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.
    • by caino59 (313096) <jcaino&obscure[nospam]reality,net> on Monday June 07, 2004 @01:10PM (#9357622) Homepage
      We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

      You forgot the link: Litigious Bastards [sco.com]

      • by clarkcox3 (194009) <slashdot@clarkcox.com> on Monday June 07, 2004 @01:45PM (#9357948) Homepage

        That's just irresponsible. By putting that link there (the one that says Litigious Bastards [sco.com]), you're contributing to the problem.

        Again, responsible people do not put "Litigious Bastards [sco.com]" links in their slashdot posts.

        Think about it? How would you like a google search for Litigious Bastards [sco.com] to point to your company, leading everyone to think that you and your co-workers are nothing but a bunch of Litigious Bastards [sco.com]?

        • Grow up (Score:5, Funny)

          by scrytch (9198) <chuck@myrealbox.com> on Monday June 07, 2004 @02:25PM (#9358342)
          You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO [litigousbastards.com]. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO [pigfuckers.com] come up with a different link for SCO [daryls-wif...harges.com] so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".

          "Dumb fucker", "miserable failure", etc ... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.
        • Re:Why just wikis? (Score:5, Informative)

          by Eivind (15695) <eivindorama@gmail.com> on Monday June 07, 2004 @02:33PM (#9358405) Homepage
          It's working almost *too* well. Not only are SCO the number one hit for "litigious bastards", but they're also the number one hit for "litigious" or "bastards" alone.

          Then again maybe that mostly says something about their popularity.

      • Top 5 reasons that unix > linux, according to SCO

        SCO UNIX® is a Proven, Stable and Reliable Platform
        SCO UNIX® is backed by a single, experienced vendor
        SCO UNIX® has a Committed, Well-Defined Roadmap
        SCO UNIX® is Secure
        SCO UNIX® is Legally Unencumbered

        HAHAHAHAHAAHHAHAHAHAHAHAHA

        That should be a top 10 list, and on letterman's show
    • Re:Why just wikis? (Score:3, Interesting)

      by abscondment (672321)

      posting on Wikis doesn't screw up your own blog.

      posts on message boards will be deleted quickly, unless the board is expressly google bombing (as in the current Nigritude Ultramarine 1st placer [google.com]) / people are stupid

      i think the idea is that wikis make it easier in general for your post to stay up and not affect your blog.

      • Re:Why just wikis? (Score:5, Informative)

        by ichimunki (194887) on Monday June 07, 2004 @01:22PM (#9357735)
        The real problem with Wikis is that the link will remain there, even after it has been removed from the current page, because most Wikis have a revision history feature. So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

        My wiki got hit by this stupid link, but not in the sandbox. Of course, recovering the previous version of the page is easy... it's wiping out any trace of the lameness that gets trickier. I suppose the easiest way to defeat this would be to require simple registration in order to edit Wiki pages.

        What else can we do? Alter the names of the submit buttons and some of the other key strings involved in Editing?
        • Re:Why just wikis? (Score:4, Informative)

          by boa13 (548222) on Monday June 07, 2004 @02:10PM (#9358194) Homepage Journal
          So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

          It has probably already been done in any wiki software worth its salt. Here's what MoinMoin [wikiwikiweb.de] does for example:

          * It has a regexp of HTTP_USER_AGENTS which should receive a FORBIDDEN for anything except viewing a page. The default setting includes many known bots (including Google) and utilities such as wget.
          * Most pages contain the appropriate robot meta tag, whith the relevant noindex and/or nofollow settings.

          In addition to that, the webmaster can of course set up a robots.txt file, and actually should do so because there are tools out there which don't understand the robot meta tags (or they don't want to take a performance hit) and the user agent of which can easily be changed by the user... wget comes to mind.

          Of course, it shouldn't be too hard to add regexps to prevent certain links from being done, or certain hostnames or IPs from altering the site (editing pages, reverting them, deleting them).
    • Re:Why just wikis? (Score:5, Interesting)

      by nautical9 (469723) on Monday June 07, 2004 @01:12PM (#9357641) Homepage
      I host my own little phpBB boards for friends and family, but it is open to the world. Recently I've noticed spammers registering users for the sole purpose of being included in the "member list", with a corresponding link back to whatever site they wish to promote. They'll never actually post anything, but they've obviously automated the sign-up procedure as I get a new member every day or so, and google will eventually find the member list link.

      And of course there are still sites that list EVERY referer in their logs somewhere on their site, so spammers have been adding their site URLs to their bot's user agent string. It's amazing the lengths these people will go to spam google.

      Sure hope they can find a nice, elegant solution to this.

      • I'm not sure this will make you feel better but this startergy has a limited lifetime.

        The contribution of your page to another pages page rank depends on two factors, firstly the page rank of your page, and secondly the number of links coming from your page.

        As more people take up this tactic the return everyone gets from it, gets smaller. E.g. When there are hundred of links on that page they cease to have any real value. Eventually people should give up on this one.

      • by Saeed al-Sahaf (665390) on Monday June 07, 2004 @01:26PM (#9357765) Homepage
        Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...
        • by stevey (64018) on Monday June 07, 2004 @01:33PM (#9357821) Homepage

          There was a story about defeating this system on /. a while back.

          Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.

          Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.

          Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...

          Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..

      • Re:Why just wikis? (Score:2, Informative)

        by Anonymous Coward
        Just set your robots.txt to exclude the user list. Or if you don't have many friends and family, send yourself an 'approve member' email. Then start training your spam filter on fake accounts.
    • by Anonymous Coward on Monday June 07, 2004 @01:16PM (#9357682)

      Why not normal discussion boards and blogs?

      As an employee of JBOSS [jboss.org], I'm shocked and appalled at your suggestion. Fortunately, JBOSS [jboss.org] is working on a new JBOSS [jboss.org] solution to overcome this problem using JBOSS [jboss.org]. We at JBOSS [jboss.org] are passionate that our JBOSS [jboss.org] technology will prevent even non- JBOSS [jboss.org] users from taking advantage of boards this way.

      Frank Lee Awnist
      JBOSS [jboss.org] Employee
      JBOSS [jboss.org] Inc.

      JBOSS [jboss.org] JBOSS [jboss.org] JBOSS [jboss.org]

    • Most wiki sandboxes will let you modify them without any sort of registration at all, so it's much more time effective than signing up for a bunch of discussion boards, waiting for the validation emails, etc. They also probably have a higher average page rank than most discussion boards and blogs would, so a little goes a long way.

  • by raehl (609729) * <raehl311&yahoo,com> on Monday June 07, 2004 @12:56PM (#9357478) Homepage
    In the real world, there are neighborhood watch signs to "deter" criminals.

    Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.
    • Why not put the sandbox in it's own folder and add an entry to the robots.txt telling it not to browse that folder?
    • by Random Web Developer (776291) on Monday June 07, 2004 @01:10PM (#9357623) Homepage
      There is a robots meta tag for this that you can put in your headers for a single page (robots.txt needs subdirs) but unfortunately most webmasters are too ignorant to realize the power of these:

      http://www.robotstxt.org/wc/meta-user.html

    • This fails to address the real issue.

      That is, even if you make your links useless (easy with a no-follow meta tag) it wont help, the majority of this spam is AUTOMATED, and will spam your wiki/blog/guestbook based on simple page queues.

      Your best personal defense is to manually remove any page or html queues that a spammer would pick up on as being common to a certain type of postable web page or element.

      Bloggers have been creating blacklists (banning both poster ips and destination urls) with some degree
    • I think one quick, easy fix is to disallow hyperlinks in the comments / guest book. If it isn't an "a href" then Google's spider won't take it.
    • just like spam (Score:3, Insightful)

      by SethJohnson (112166)


      Your suggestion is well-thought-out, but is plagued by two problems.


      1. The bombing bots won't give a rat's ass if you add this to robots.txt. Just like spammers, there's not cost for them to hit your site anyway. Even if Google is instructed to ignore the links.

      2. Your site's google ranking is affected by the quality of the links you feature pointing at other sites. Your solution unbalances this whole matrix.
  • Oh well (Score:5, Informative)

    by SpaceCadetTrav (641261) on Monday June 07, 2004 @12:57PM (#9357485) Homepage
    Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.
  • Yes... PLEASE... (Score:5, Insightful)

    by Paulrothrock (685079) on Monday June 07, 2004 @12:57PM (#9357491) Homepage Journal
    Google needs to do something about this. I had to turn off comments on my blog because all I was getting was spam. Two or three a day that I had to go in and delete. I have to now find a system that will keep the bots out.

    What happened to the nice internet we had in 1996?

    • I still haven't really seen a problem with this on my blog. I've had comments enabled for the past two years and have maybe gotten 3 or 4 total spam comments in that time (one today actually).

      Mine has always been set to not allow anon comments, but I know most people have that set as well.

      I have been using MovableType and just haven't really had any problems. Been lucky I guess.
      • I'm using Wordpress, and before that b2. It's only started in the past month, too.

        Unfortunately, my spam comments fill in the email fields, so I can't turn of anonymous comments. Is there any way for me to get the IP addresses of spam comments and forward this to the authorities?

    • As my site grows, I'm thinking about adding a mechanism to address those issues: when the user requests a page for the first time, he'll get a session value that says he's a valid visitor to the site. When he submits a comment, he has to have that value, or comments aren't allowed. I don't know how you'd write a script to circumvent that. (If someone can tell me, I'd love to know so I try to prevent it!)
      • Well if you're setting a "session value", you're either using cookies or rewriting the links. So all that the script has to do is handle cookies properly or follow your "post a comment" links, neither of which is very hard.
      • Re:Yes... PLEASE... (Score:3, Interesting)

        by joggle (594025)
        Why not generate an image containing modified text like yahoo and others? Using a little PHP magic, it shouldn't be too hard (see here [resourceindex.com] to get a start).
    • Re:Yes... PLEASE... (Score:5, Interesting)

      by n-baxley (103975) <nate@@@baxleys...org> on Monday June 07, 2004 @01:12PM (#9357638) Homepage Journal
      The system was even easier to rig back then. Back in 96ish, I created a web page with the title "Not Sexy Naked Women". Then repeated that phrase several times and then gave a message telling people to click the link below for more Hot Sexy Naked Women which took them to a page that admonished them for looking for such trash. I added a banner ad to the top of both of these pages, submitted them to a search engine and made $500 in a month! Things are better today, but they're still not perfect.
    • What happened to the nice internet we had in 1996?

      i blame blogs
      • No, I blame opportunistic bastards who can't see that it's okay to not profit from something. *Thinks about his sledding hill that was destroyed by an upscale minimall.*
    • I had to turn off comments on my blog because all I was getting was spam.
      The simple solution [godaddy.com] is to require the poster to read a distored graphic of a random numeric value and enter the value into a field in order to submit his message.
  • like porn (Score:5, Interesting)

    by millahtime (710421) on Monday June 07, 2004 @12:58PM (#9357498) Homepage Journal
    These seems similar to the system all those porn systems used to get such a high rank in google.

    Kind playing the system with the content not being quite as desirable.
  • You know... (Score:3, Insightful)

    by fizban (58094) <fizban@umich.edu> on Monday June 07, 2004 @12:58PM (#9357505) Homepage
    ...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.
    • Re:You know... (Score:4, Insightful)

      by Anonymous Coward on Monday June 07, 2004 @01:04PM (#9357565)
      that button will also get spammed, as bots will click 'yes' for their sites and 'no' for the competitors sites
    • Re:You know... (Score:4, Insightful)

      by goon america (536413) on Monday June 07, 2004 @01:09PM (#9357612) Homepage Journal
      Wouldn't that be equally abused?
      • I'm guessing that you are asking:

        "What's to keep Google-bombers from marking down the significance of real links in order to increase the rank of their links?"

        One way to mitigate it is simply to let a given IP address mark a link as good or bad only once. The bomber would have to use a multitude of IP addresses in order to make any significant counter to the huge number of legitimate users that would be marking them down. It would be too labor intensive and therefore cost prohibitive.
        • Re:You know... (Score:3, Insightful)

          by Nasarius (593729)
          Ah, but how long will it take for someone to write a worm with a Google-abusing payload? We've already got spammers using hacked PCs to send mail.
    • by mcmonkey (96054) on Monday June 07, 2004 @01:24PM (#9357749) Homepage
      'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'

      Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.

      Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.

      It sounds crazy, I know, but I think we could do this.
  • < jab jab > (Score:2, Interesting)

    by jx100 (453615)
    Well, couldn't have been that successful, for he didn't win [searchguild.com].
  • Some people ... (Score:2, Insightful)

    by TheGavster (774657)
    It still gets me how the people who are participating in the nigritude ultramarine thing don't see anything wrong with what they're doing. This line particularly got me:
    "Without, as opposed to guestbook spamming, being evil it's a sandbox after all."

    Yes its a sandbox, no its not your personal playground.
  • google works (Score:4, Informative)

    by mwheeler01 (625017) <{moc.liamg} {ta} {releehw.l.wehttam}> on Monday June 07, 2004 @01:00PM (#9357518)
    Google does tweak their ranking system on a regular basis. When the problem becomes evident, (and it looks like it just has) they do something about it...that's why they're google.
  • by lukewarmfusion (726141) on Monday June 07, 2004 @01:00PM (#9357523) Homepage Journal
    Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."

    Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.

    Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.
    • Yes it is. When with less than a million links miserable failure searches on google are linked to President Bush's biography on the whitehouse web site, that's a problem (leave your political views out of this). Same geos for Weapon's of Mass Destruction and other google bombs. Google....fix it now before it gets to be a real problem.
      • Two issues here:

        1. The problem still exists on the side of the provider with the links. Who coordinated these million links that resulted in the "Google bomb?" Why not complain to them?

        2. Is it really a problem? Google has no public responsibility to report rankings according to the demand of anyone; if they wish to block Linux altogether and replace Linus/OSS searches with Microsoft-sponsored results, they can do so. But it would hurt their business and credibility. I'm confused as to why people think th
        • No wait....the Google algorithm has a hole. Does the presidents biography have the words miserbale failure in it? Why is the linked text taken as a meaning of what is on the site? Those webmasters who put the link all over thier sites are only taking advantage of a hole in the Google algorithm. Google should simply do a text search and make sure that miserable failure is actually ON the web page that that text links to. Then google bombs would have no effect.
    • by bcrowell (177657)
      Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."
      I'm not even convinced Google's algorithm has a problem. One thing a lot of people don't realize about the page rank algorithm is that your page rank goes down if you have lots of outgoing links that aren't reciprocated with links coming back from the site you linked to. It may be that this technique simply leads to a reduction in the page rank of the sandbox, which, after all, is approp
  • ROBOTS.TXT (Score:5, Insightful)

    by gtrubetskoy (734033) on Monday June 07, 2004 @01:00PM (#9357524)
    The burden is not on Google, but on Wiki sandbox admins, who should provide proper ROBOTS.TXT files to inform Google that this content should not be indexed.

    As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.

    • wtf. That's not insightful.

      First of all, while my wiki is mostly personal junk, there's no reason it shouldn't be indexed. And many open source projects use Wikis as a primary source of documentation.

      Secondly, the cat is out of the bag; I doubt these spammers are checking whether the sandboxes are indexed by Google.

      I'm mostly pissed off that the edits to my sandbox have been only from nigritude ultramarine [slashdot.org] people. Frankly, I think google should stomp on that contest by not allowing the words to be sea
  • Ok, but the same webmaster says [outer-court.com]:

    I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.

    This sure makes sense, and who knows, maybe future wiki distributions do it by defau

  • Complacency (Score:5, Interesting)

    by faust2097 (137829) on Monday June 07, 2004 @01:01PM (#9357534)
    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.

    But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.

    • But they're too busy making an email application with 9 frames and 200k of Javascript

      Because, of course, if they weren't doing that, every last one of the engineers on that project would be tinkering with the search engine instead. It's not like they have separate engineering teams or people with different areas of expertise there or anything.

    • It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.

      Please, if you're going to complain, give a concrete example of the search terms you're using, and what results you're expecting. I haven't had any trouble finding what I want on Google in the years I've been using it.
  • by digitalgimpus (468277) on Monday June 07, 2004 @01:01PM (#9357536) Homepage
    I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....

    From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.

    It's not about meta tags, or submitting anymore... it's spamming.

    Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?

    IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?

    If they are going to use my blog... why can't I get a cut in that business?
    • IMHO those services are pure evil.
      No, 9/11 was pure evil, some unwanted comments on a blog is an annoyance. If you have a website that allows anyone to post comments, you will get some you don't like. That's life.
      • I beg to differ with you on the matter of it being only "an annoyance." I've had to delete comments on my own weblog that (supposedly) link to underage pornography sites. I'm not a lawyer, but I'm fairly certain that it is illegal to link to child pornography. Assuming that this is true, those SEOs are actually causing you, the innocent weblog/wiki owner, to unwillingly and unwittingly commit a criminal act.

        Is it still just "annoying?"
    • I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....
      I had a spate of comment spamming too, about a month ago. In fact, that was what inspired me to move from blogware (WordPress) to a full-up CMS (PostNuke). The comment spammers' scripts don't seem to have found PostNuke yet. By the time they do, I'll have anti-bot measures in place (if I haven't simply closed comments to unregistered users).
  • This happened to me (Score:5, Interesting)

    by JohnGrahamCumming (684871) * <slashdot@NosPAm.jgc.org> on Monday June 07, 2004 @01:02PM (#9357539) Homepage Journal
    This happened on the POPFile Wiki [sourceforge.net]. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.

    It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.

    John.
  • I've seen this (Score:4, Informative)

    by goon america (536413) on Monday June 07, 2004 @01:04PM (#9357559) Homepage Journal
    I just reverted some pages on my watch list on Wikipedia that had been edited with a google spam bot to link all sorts of words back to its mother site.... lots of mistakes, looked like the script they were using hadn't been tested that well yet. (Would post an example, but wikipedia is completely fuxx0red at the moment).

    This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.

  • Something that would make a nice opensource project would be to include p2p search functionality in apache itself.
    This way all the modificed web servers would make a giant distributed search engine.
    Some nice algorithms like koorde or kademlia could be used.
    Anyone thought about starting something like this?

    David
    • Something that would make a nice opensource project would be to include p2p search functionality in apache itself. This way all the modificed web servers would make a giant distributed search engine. Some nice algorithms like koorde or kademlia could be used. Anyone thought about starting something like this?

      We looked into something a lot like what you suggest [ibm.com] (and actually have it up and running inside our intranet with 2k or so users). The problem with doing this on the internet is that p2p technique

  • Google. (Score:4, Interesting)

    by Rick and Roll (672077) on Monday June 07, 2004 @01:05PM (#9357576)
    When I search on Google, half the time I am looking for one of the best sites in a category, like perhaps "OpenGL programming". Other times, however, I am looking for something very specific that may only be referenced about twenty times, if at all.

    When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.

    The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.

    Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.

    I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.

    Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.

  • by boa13 (548222) on Monday June 07, 2004 @01:05PM (#9357579) Homepage Journal
    But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.

    The Arch Wiki [gnuarch.org] has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).

    So, this is not new, but this is now news.
    • One to look out for is <div style="display:none;"> if html can be posted. It makes the span invisible to any human reader but I doubt that any current search engine can identify the purpose of such a tag.
  • Not a big deal (Score:5, Informative)

    by arvindn (542080) on Monday June 07, 2004 @01:06PM (#9357589) Homepage Journal
    Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.
  • Hmm (Score:4, Interesting)

    by Julian Morrison (5575) on Monday June 07, 2004 @01:11PM (#9357628)
    Leave the links, edit the text to read something like "worthless scumbag, scamming git, googlebomb, please die, low quality, boring" - and lock the page.
  • Wait a minute - a way to spoof Google to get your page ranked better through WiKi? OMFG! Call the internet police, call Dr. Eric E. Schmidt, call out the Google Gorilla goons! I'm sure the good Dr. has a fix like the ones he used at Novell...

    The problem with the whole Google model is that it's biased to begin with. If I'm looking for granny-smith apples, chances are an internet chimp they've bought the space with banana's to Google's goons. It becomes obvious when you see a chimp site that is near the
  • True (Score:4, Funny)

    by Pan T. Hose (707794) on Monday June 07, 2004 @01:13PM (#9357644) Homepage Journal

    "Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"

    I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...

    • I think it stands to reason that Google shouldn't give ANY opening to competition. If there is a major complaint about how the system works, fix it.

      If Google just sits around then the competition will likely catch up.
  • I expect that Google will in time give drastically lower weight to easily-modified pages like "blogs" and "wikis". They're not that hard to recognize.
  • by gmuslera (3436) on Monday June 07, 2004 @01:29PM (#9357789) Homepage Journal
    If its a test area, is needed to store it? Wikis could just have it live for the current session or testing of the user, and when the user logs out or finish editing, simply delete/restore it to a default introductory text. Don't need to be some kind of collaborative blackboard or graffiti wall, or at least, if it must be, that be the webmaster choice to be that way (at least TikiWiki [tikiwiki.org] let me disable the sandbox if i want).

    But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.

  • "Finally"?? (Score:5, Interesting)

    by jdavidb (449077) on Monday June 07, 2004 @01:30PM (#9357802) Homepage Journal

    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.

    The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.

    • Re:"Finally"?? (Score:3, Informative)

      by jdavidb (449077)

      I checked, and I've got documented evidence of this. On April 25 last year, I reported that earthlink.net was showing up as the top search result [perl.org] for queries involving various religious words, including "Bear Valley Bible Institute." The Church of Scientology (which owns Earthlink) was clearly engaging in something to distort the page rank of earthlink. I had noticed this for a long time before I recorded it.

      On that same day, I reported the problem to Google via their feedback mechanism. I note today

      • So at any rate, to sum up, I find the whining about Google "finally" doing something about this to be very unfair, since Google actively works on this kind of problem. It is disingenuous to dismiss their hard work and suggest that they have done nothing.

  • simply make a distinction between "I am looking to buy something" searches vs "I am looking for information about something".

    They are cleary different kinds of searches, and I do both of them, yet I get the same results for both kinds of searches. With the exception for froogle, which is definitely a step in the right direction, but not quite there.

    Although the interface has gotten a little better on altavista (remember them??), but searches like: for used condoms [altavista.com] do not make sense for retail stores at a
  • Easy solution (Score:3, Insightful)

    by lightspawn (155347) on Monday June 07, 2004 @01:34PM (#9357833) Homepage
    Edit robots.txt to let search engines know they should ignore sandbox pages.

  • That's a very interesting article.

    Sig
    --
    KEY PHRASE <A HREF=www.my_website.com> KEYWORD KEYWORD KEYWORD <\A>
  • by MaximusTheGreat (248770) on Monday June 07, 2004 @01:38PM (#9357870) Homepage
    What about using random image based spam control lik the one yahoo uses on its new mail signup?
    So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.
    • Hear, hear. Systems (software or otherwise) that offer something of monetary value for free, and provide no mechanism whatsoever to prevent people from exploiting them, are going to get exploited. Shocking!

      Maybe it wasn't obvious to blog and wiki programmers that the ability to post a comment or edit a wiki page was worth money. It isn't worth a lot per post, but because these are online systems, they are very susceptible to bots that can post in huge volume. All of those posts together can alter a site's
    • I've always wondered why the image is always distorted images which are hard to read on speckled backgrounds?

      Why not just show the picture of an object, like an apple or something, and ask the user to type in what it is? I mean, you could have a few hundred of these and it would be nearly impossible for an automated system to guess. (You have a few hundred different items, and like 5-10 images of each item.) I dunno, seems easier to me, but I don't write web software.
  • Clean sandbox daily. (Score:3, Informative)

    by chiph (523845) on Monday June 07, 2004 @02:24PM (#9358333)
    As any cat owner will tell you, you need to clean the sandbox out periodically. In the case of a Wiki, overnight would probably be a good idea.

    Chip H.
  • by wamatt (782485) on Monday June 07, 2004 @02:30PM (#9358376)
    Spammers are going there because you have a high PR. So cut the PR supply and you in business, http://www.site.com/~url=http://www.link.com and voila - URL rewriting. no more PR for mr spammer.

There are running jobs. Why don't you go chase them?

Working...