Stories
Slash Boxes
Comments

News for nerds, stuff that matters

MS Research Automates Search Engine Spam Hunt

Posted by Zonk on Thu Jul 13, 2006 02:35 PM
from the i'm-all-for-less-junk dept.
Barbie Dollar writes "Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers. The project, called Strider Search Defender, automates the discovery of search spammers through non-content analysis. The project integrates technology from two previous Microsoft Research prototypes (Strider HoneyMonkey and Strider URL Tracer) and promises a new approach to removing junk results from search engine queries."

Related Stories

[+] Microsoft 'URL Tracer' Hunts Typosquatters 124 comments
TonioSop writes "Microsoft Research has released a new tool to help pinpoint large-scale typosquatters that are known to be gaming pay-per-click domain parking services. The lightweight prototype, called Strider URL Tracer, builds on the work within Microsoft's Cybersecurity and Systems Management group to keep tabs on a sophisticated typosquatting scheme that uses multilayer URL redirection to make money from Google's AdSense for domains program. "
[+] Microsofts "Honeymonkey" Project 320 comments
g0bshiTe writes "Ever hear the saying, 'given enough time a room full of monkeys could type out Shakespeare'? Well Microsoft seems to be taking this saying to heart, and taking a cue from the Honeynet project, they have created what they have dubbed 'honeymonkeys.' Security Focus has an article which describes this honeymonkey network, which is little more than a network of virtual Windows XP boxes in various patch states. These boxes are setup to crawl the seedier side of the web in search of vulnerabilities not bieng reported, and are being actively exploited in an attempt to further secure their product. Sounds like a decent idea from the Redmond crew to me."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • This just in..... (Score:5, Funny)

    by Mayhem178 (920970) on Thursday July 13 2006, @02:41PM (#15714308)
    Every anti-Microsoft blog and article in existence has been flagged as search engine spam.

    More at 11.
  • Still seems reactive (Score:3, Insightful)

    by MrNougat (927651) <ckratsch@nOSpam.gmail.com> on Thursday July 13 2006, @02:43PM (#15714319)
    Sure, preventing search engines from indexing blogspam posts is great. Maybe that's the first step, but it's not going for the root cause - the botnets that run the apps that post/email in the first place, and the compromised webservers hosting order sites.
    • It's not an either-or situation (Score:5, Insightful)

      by ScentCone (795499) on Thursday July 13 2006, @02:53PM (#15714382)
      Sure, preventing search engines from indexing blogspam posts is great. Maybe that's the first step, but it's not going for the root cause - the botnets that run the apps that post/email in the first place, and the compromised webservers hosting order sites.

      These are not mutually exclusive goals. If you take away any incentive for spamalizing content (meaning, not only does it not boost your search placement, it penalizes you), then much of the pressure to run botnets and crack servers goes away.
      [ Parent ]
  • by us7892 (655683) on Thursday July 13 2006, @02:45PM (#15714337)
    Microsoft, by cracking down, could effectively decrease the spam sites, the results would be fewer AdWords and microAds displayed and clicked, and could lower revenue for Google and Yahoo.

    A side effect is better search results, which would increase use of Google again. Where is MSN Search in all of this...I don't know. But fewer of those crap sites, the better.
  • Not before time. (Score:2, Interesting)

    I'm all for people being allowed to try and game the system...Anything else would restrict the whole purpose of the Internet as a repository for whatever the hell someone wants to put in there.

    At the same time, I'm all for search engines blacklisting people who game the system, parked domains, crap aggredator pages, etc. It's all about building a better mousetrap.
    • 1 reply beneath your current threshold.
  • But I thought.. (Score:4, Funny)

    by Anonymous Coward on Thursday July 13 2006, @02:46PM (#15714344)
    ..that Strider HoneyMonkey was Arwen's pet name for Aragorn?
  • Cover-up (Score:2)

    by Kesch (943326) on Thursday July 13 2006, @02:51PM (#15714371)
    "Strider Search Defender" is just a cover name. It's really the "Aragorn Search Defender" it just likes to remain incognito so that spam-zombies don't think to hunt it down.
    • Re:Cover-up by creimer (Score:3) Thursday July 13 2006, @02:57PM
  • Go Microsoft! (Score:5, Insightful)

    by eebra82 (907996) on Thursday July 13 2006, @02:56PM (#15714408)
    (http://www.insidebet.com/)
    All major search engines have been doing this for quite some time. Google is probably the best hunter of them all and the most recent update, which occured on June 27, banned a large number of spammers who had billions of sites indexed. Unfortunately, the war on spam is quite difficult. They spammers are working with non-content pages but it is a matter of time before they start generating non-jibberish content to spam with, too.

    Hopefully, Microsoft's approach will give some effect and push other operators to work harder on preventing the web spam.

    Amusingly, you're most likely getting affected only if you're searching for penis pumps, pornographic content and gambling.
  • Human Powered? (Score:4, Interesting)

    by pembo13 (770295) on Thursday July 13 2006, @03:02PM (#15714439)
    (http://www.pembo13.com/)
    Seems to me that a group of 10 people could easily flag a large amount of spam websites. Is this currently being done by any major engine?
  • if it works (Score:1, Flamebait)

    by BarryLoper (928015) on Thursday July 13 2006, @03:04PM (#15714447)
    If they could make something like this work, it would be a big draw away from Google.

    Of course, with their track record of Neat Ideas vs. Actual Products, (WinFS, etc.) I'm not holding my breath.

    I am, however, wishing them luck.

  • by IO ERROR (128968) <error.ioerror@us> on Thursday July 13 2006, @03:07PM (#15714469)
    (http://www.ioerror.us/ | Last Journal: Sunday May 22 2005, @06:28AM)
    Microsoft forgot to mention my non-content based method of blocking comment spam entirely known as Bad Behavior [homelandstupidity.us]. And now that they seem to have swiped a few of my ideas, I'm going to have to go see what they're up to...
  • Good. (Score:3, Interesting)

    by ExileOnHoth (53325) on Thursday July 13 2006, @03:46PM (#15714699)
    This *must* be one of the next battle lines in the so-called search wars.

    I remember the first time I saw google - I was blown away: "Wow. These results are exactly the web pages I was looking for!" But that's no longer the case when you search in google. They've really fallen behind in being able to separate out (or, as they say, "search for") the pages I want from the junk.

    I hope google will win this war, but maybe microsoft chucking some money at the problem will help light a fire under google to get this fixed before someone else does it better. If searching at google no longer brings me relevant results better than any other source, I'm gonna start looking for somewhere else to search. Just like I did when I switched to google from yahoo back in the twentieth century.
  • we can only hope (Score:2)

    by Tibor the Hun (143056) on Thursday July 13 2006, @06:08PM (#15715354)
    we can only hope that this research is as fruitful as their speech synthesis research, email spam blocking, multiplatform video codec, next-gen filesystem, advanced CLI shell, and portable computing.
    yay for MS research!
  • Experimental... (Score:1)

    by Tavor (845700) on Thursday July 13 2006, @07:47PM (#15715854)
    So in other words, it'll be called Aragorn when it becomes master?
  • by spion666 (922711) on Thursday July 13 2006, @08:52PM (#15716148)
    (http://spion.ws/)
    Google could cut their spam to 1/4 if they stop accepting websites whose domains are less then 7 days old (Will render domain kiting useless)
  • This addresses a particular kind of spam page that is promoted in a particular way.

    But it does nothing to address the vast majority of the pages that contaminate search engine results. I'm referring to automatically generated pages that look like good pages and hence rank well in search engines, but really have little except links and perhaps some public domain info. E.g., there could be one each for every resort hotel in Mexico. The search engine result turns up a summary that makes it look like there are "reviews" there. But either the reviews section is empty, or else they reproduce something that's available on dozens of other sites as well. In one case, apparently, a single such site had 4 billion "different" pages. I'm not making that number up.

    More sophisticated kinds of link-network analysis will be needed before those bite the dust.
  • hmm... (Score:2)

    by bnitsua (72438) on Friday July 14 2006, @01:00AM (#15717091)
    non-content analaysis? isn't that patented by slashdot readers?
  • by dedazo (737510) on Thursday July 13 2006, @02:50PM (#15714366)
    (Last Journal: Friday August 31, @07:08PM)
    I don't know how this is relevant. Would you expect Google to share something like this with Microsoft? When was the last time you saw Google or Overture sharing propietary search algorithms with their competitors?
    [ Parent ]
  • Re:Will they share? (Score:4, Insightful)

    by idesofmarch (730937) on Thursday July 13 2006, @02:54PM (#15714392)
    First, do not be so skeptical. Have you noticed how well Outlook 2003 spam filtering works? I realize the algorithm is different, but based on results, I have to say that it is probable that Microsoft will succeed with reasonable effectiveness.

    Second, what business rationale is there to give away a competitive advantage (after spending millions to get it) in the very competitive search market, where, by the way, Microsoft is not the market leader?
    [ Parent ]
  • by ScentCone (795499) on Thursday July 13 2006, @02:56PM (#15714405)
    So, if by some miracle, they actually discover a way to hunt down and nuetralize the search engine spammers, what are the odds that they share this information with other Search Engine companies?

    Their purpose is to make their own search engine more effective for users, thus generating more traffic for them. A nice side effect would be that Yahoo and Google, etc., would feel more pressure to integrate similar technologies into their own engines. As usual, competition produces the best results.
    [ Parent ]
  • by CaymanIslandCarpedie (868408) on Thursday July 13 2006, @03:04PM (#15714452)
    (Last Journal: Sunday July 01, @08:03AM)
    what are the odds that they share this information with other Search Engine companies?

    Probably about the same odds as Google sending Yahoo and MSN detailed specs of thier search algorithums or the 2008 Republican presidential candidate going out and campaigning for the Democratic candidate or the US shipping Iran a fully functional atomic weapon production facility ..... I could go on, but you probably get the idea. Sometimes competitors want to beat thier opponents you see.
    [ Parent ]
    • 1 reply beneath your current threshold.
  • by Pengo (28814) on Thursday July 13 2006, @03:07PM (#15714467)
    (Last Journal: Thursday October 24 2002, @10:23AM)

    Long Live Competition!

    This is how these markets are supposed to work. Let the smartest/best company with the best product find success and enjoy the fiscal rewards.

    If MSN can out-do Google, I'd move my search traffic there in a heartbeat. Of course Google won't let that happen, WE THE -CONSUMER- WINS! This isn't communism, no reason that a company should have to give their competition their work if they put the effort into solving a problem/finding a solution.

    [ Parent ]
  • by krewemaynard (665044) <<krewemaynard> <at> <gmail.com>> on Thursday July 13 2006, @04:12PM (#15714841)
    Google will probably come up with their own, better methods. Besides, MS wants to crush Google, so no, they won't share.
    [ Parent ]
  • Re:and shut down? (Score:1, Insightful)

    by Anonymous Coward on Thursday July 13 2006, @04:26PM (#15714897)
    How the fuck did a post from somebody who clearly hasn't even read the ''summary'', let alone the article, get modded up "Insightful"? Mods, just because a poster has a low ID doesn't mean their posts are always worth reading.

    For reference:
    (a) What does shutting down Windows boxes have to do with searching for search-engine spam?
    (b) How does search-engine spam "find" you?

    Could it possibly be that you saw the word "spam", and your brain shut off while you wrote a nonsensical post that might just have made sense in the context of an article about email-spam zombie computers, but is totally irrelevant in the context of search-engine spam?
    [ Parent ]
  • Re:Strider Hiryu (Score:2)

    by kahei (466208) on Friday July 14 2006, @10:13AM (#15718965)
    (http://www.hwacha.net/)

    Not so, I'm afraid -- he will never leave Eurasia alive.

    [ Parent ]
  • 7 replies beneath your current threshold.