Forgot your password?
typodupeerror

MS Research Automates Search Engine Spam Hunt 68

Posted by Zonk
from the i'm-all-for-less-junk dept.
Barbie Dollar writes "Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers. The project, called Strider Search Defender, automates the discovery of search spammers through non-content analysis. The project integrates technology from two previous Microsoft Research prototypes (Strider HoneyMonkey and Strider URL Tracer) and promises a new approach to removing junk results from search engine queries."
This discussion has been archived. No new comments can be posted.

MS Research Automates Search Engine Spam Hunt

Comments Filter:
  • by Mayhem178 (920970) on Thursday July 13, 2006 @03:41PM (#15714308)
    Every anti-Microsoft blog and article in existence has been flagged as search engine spam.

    More at 11.
  • by MrNougat (927651) <ckratschNO@SPAMgmail.com> on Thursday July 13, 2006 @03:43PM (#15714319)
    Sure, preventing search engines from indexing blogspam posts is great. Maybe that's the first step, but it's not going for the root cause - the botnets that run the apps that post/email in the first place, and the compromised webservers hosting order sites.
    • by ScentCone (795499) on Thursday July 13, 2006 @03:53PM (#15714382)
      Sure, preventing search engines from indexing blogspam posts is great. Maybe that's the first step, but it's not going for the root cause - the botnets that run the apps that post/email in the first place, and the compromised webservers hosting order sites.

      These are not mutually exclusive goals. If you take away any incentive for spamalizing content (meaning, not only does it not boost your search placement, it penalizes you), then much of the pressure to run botnets and crack servers goes away.
      • It's an evil cycle.

        Much like our spam emails have adapted and (mostly) overcome spam filters, link-farm search-hogs will adapt too.

        As much as we'd like to remove the root cause, nobody's going to fix "greed" anytime soon.

        In the meantime, like spam, we can make it more difficult for them to do business.
  • Microsoft, by cracking down, could effectively decrease the spam sites, the results would be fewer AdWords and microAds displayed and clicked, and could lower revenue for Google and Yahoo.

    A side effect is better search results, which would increase use of Google again. Where is MSN Search in all of this...I don't know. But fewer of those crap sites, the better.
    • They don't seem to remove the actual sites, just the entries in the search results. In this way, it helps only MS search, and rightly so, as google seems to be sleeping on this problem already for way too long.

      While google was developing online applications we weren't really waiting for, Microsoft correctly found the main spot of irritation in search results, and if they will manage to automatically remove those and provide the search results people want (not just the sponsored shit msn search has shown i

  • Not before time. (Score:2, Interesting)

    by SatanicPuppy (611928) *
    I'm all for people being allowed to try and game the system...Anything else would restrict the whole purpose of the Internet as a repository for whatever the hell someone wants to put in there.

    At the same time, I'm all for search engines blacklisting people who game the system, parked domains, crap aggredator pages, etc. It's all about building a better mousetrap.
  • by Anonymous Coward on Thursday July 13, 2006 @03:46PM (#15714344)
    ..that Strider HoneyMonkey was Arwen's pet name for Aragorn?
  • "Strider Search Defender" is just a cover name. It's really the "Aragorn Search Defender" it just likes to remain incognito so that spam-zombies don't think to hunt it down.
    • Re:Cover-up (Score:3, Funny)

      by creimer (824291)
      I was under the impression that the final name would be the "Half-Life 2 Search Defender", considering that's product will only a have a half-life of usability before the Microsoft patching system kicks in.
  • Go Microsoft! (Score:5, Insightful)

    by eebra82 (907996) on Thursday July 13, 2006 @03:56PM (#15714408) Homepage
    All major search engines have been doing this for quite some time. Google is probably the best hunter of them all and the most recent update, which occured on June 27, banned a large number of spammers who had billions of sites indexed. Unfortunately, the war on spam is quite difficult. They spammers are working with non-content pages but it is a matter of time before they start generating non-jibberish content to spam with, too.

    Hopefully, Microsoft's approach will give some effect and push other operators to work harder on preventing the web spam.

    Amusingly, you're most likely getting affected only if you're searching for penis pumps, pornographic content and gambling.
    • Re:Go Microsoft! (Score:1, Interesting)

      by Anonymous Coward
      Amusingly, you're most likely getting affected only if you're searching for penis pumps, pornographic content and gambling.

      And cracks, keygens, and warez.
    • They spammers are working with non-content pages but it is a matter of time before they start generating non-jibberish content to spam with, too.
      Like Cnet?
    • It strikes me that the asymptote of this curve is "spammers" generating actual new, useful, interesting content to push their spam. In other words, the acme of un-blockable spam sites is an ad-supported nonspam site.

      M$, Google and friends might actually drag them so far around illegitimacy they come back to legitimacy. Ironic, no?
  • Human Powered? (Score:4, Interesting)

    by pembo13 (770295) on Thursday July 13, 2006 @04:02PM (#15714439) Homepage
    Seems to me that a group of 10 people could easily flag a large amount of spam websites. Is this currently being done by any major engine?
    • Arms race.

      This is exactly what happens in email. You say "Oh! I can filter 99% of my spam by grabbing anything with 'Viagra' in the subject line!"

      The spammers, noticing this, start using subject lines like "Urgent! Read now!"

      You adjust your filter to watch for anything with "Urgent" in the subject line and "Viagra" in the body.

      They send you Vi.ag.ra instead. You catch that, they send you Vlagra.

      They send "Penis pills". You filter anything with "Penis". Then your freind changes their signature to "The
      • The difference being you are not creating a filter, but flagging sites manually as spam sites.

        This is different because it is more difficult to set up a web site and domain (and build links to get in the top search result pages) than shoot off an email. Thus you are flagging the sites themselves, not the particular 'trigger words'.

        Do a search for 'buy mobile phones' or some such crap.. look at the top 10 results or so. If they are obviously spam sites then flag them, and their entire domain. Have regular go
        • When you start knocking out the sites in entire domains and whois info's at a time, and are getting rid of mostly the spam sites hogging the top ten sites in search results, I dont think it would take too long to clean it out.

          Are you sure? DNS is BIG, and I'm pretty sure you can automate buying domains -- they're pretty cheap, too. Also, remember that whois info can be faked, and often is (deliberately) by sites like GoDaddy to say that GoDaddy owns the domain, hiding the info of whoever really controls/

          • Well my other comment about PR still stands. When they buy new domains or move to different sites.. they drop from the search results, google sandboxes them etc. Google also takes into consideration age of domains etc. Building PR takes a long time.. and the spam sites cant just move around all the time and still get traffic.
    • And if any of those 10 people happens to have a personal grudge against someone or something...
      • Well here's hoping good management, and possibly a "weekend review" process would be helpful. Monday to Friday, 8 - 5, the search using popular keywords (and misspellings) compile a list, look for similar IP, host, etc. At the end of the week, have a 2nd party veryify the reults, Monday morning put in the block, rince, repeat.
  • if it works (Score:1, Flamebait)

    by BarryLoper (928015)
    If they could make something like this work, it would be a big draw away from Google.

    Of course, with their track record of Neat Ideas vs. Actual Products, (WinFS, etc.) I'm not holding my breath.

    I am, however, wishing them luck.

  • by IO ERROR (128968) <errorNO@SPAMioerror.us> on Thursday July 13, 2006 @04:07PM (#15714469) Homepage Journal
    Microsoft forgot to mention my non-content based method of blocking comment spam entirely known as Bad Behavior [homelandstupidity.us]. And now that they seem to have swiped a few of my ideas, I'm going to have to go see what they're up to...
    • I installed Bad Behavior a few months ago on a community website... for three days.
      During three days I logged a lot of actual (and logged in) users being blocked... then I tried to spam my own site by using opera and a fake user-agent + elite proxy and I had no problem doing that...

      So yes, I guess it has the qualities required to be a good microsoft product.
  • Good. (Score:3, Interesting)

    by ExileOnHoth (53325) on Thursday July 13, 2006 @04:46PM (#15714699)
    This *must* be one of the next battle lines in the so-called search wars.

    I remember the first time I saw google - I was blown away: "Wow. These results are exactly the web pages I was looking for!" But that's no longer the case when you search in google. They've really fallen behind in being able to separate out (or, as they say, "search for") the pages I want from the junk.

    I hope google will win this war, but maybe microsoft chucking some money at the problem will help light a fire under google to get this fixed before someone else does it better. If searching at google no longer brings me relevant results better than any other source, I'm gonna start looking for somewhere else to search. Just like I did when I switched to google from yahoo back in the twentieth century.
  • we can only hope that this research is as fruitful as their speech synthesis research, email spam blocking, multiplatform video codec, next-gen filesystem, advanced CLI shell, and portable computing.
    yay for MS research!
  • So in other words, it'll be called Aragorn when it becomes master?
  • Google could cut their spam to 1/4 if they stop accepting websites whose domains are less then 7 days old (Will render domain kiting useless)
    • You do relise that a new domain takes exactly 7 days to be older then 7 days, or it takes X days to become X days old. If you put a random number of days, spamer will simply wait for this number of days before they will spam the site. In fact google already does this in a sense, younger sites have a lower page-rank then older similar sites.

      I do agree that extremely new sites with weird domains names should be scrutinised before entering the engine.
  • This addresses a particular kind of spam page that is promoted in a particular way.

    But it does nothing to address the vast majority of the pages that contaminate search engine results. I'm referring to automatically generated pages that look like good pages and hence rank well in search engines, but really have little except links and perhaps some public domain info. E.g., there could be one each for every resort hotel in Mexico. The search engine result turns up a summary that makes it look like there a
  • by bnitsua (72438)
    non-content analaysis? isn't that patented by slashdot readers?

Never appeal to a man's "better nature." He may not have one. Invoking his self-interest gives you more leverage. -- Lazarus Long

Working...