Slashdot Log In
Microsoft Tracks Down Mass Fake Web Pages
Posted by
Zonk
on Tue Mar 20, 2007 08:38 AM
from the going-from-point-a-to-z74 dept.
from the going-from-point-a-to-z74 dept.
An anonymous reader writes "According to an article on New York Times, Microsoft researchers have discovered tens of thousands of junk Web pages, created only to lure search-engine users to advertisements. While most of us have run across them from time to time, the company researchers have found the pages are deliberately generated in vast numbers by a small group of shadowy operators. By following the money trail, Microsoft researchers were able to track the flow from big-name advertisers to search engine spammers. Many use Google's blogspot.com to set up spam doorway pages. 'The practice has proved to be a vexing problem for the major search companies, which struggle to prevent both spammers and companies specializing in improving legitimate clients' Web traffic -- a field known as search-engine optimization -- from undermining their page-ranking systems. Surprisingly, the researchers noted that the vast bulk of the junk listings was created from just two Web hosting companies and that as many as 68 percent of the advertisements sampled were placed by just three advertising syndicators.' The report is available at Microsoft Strider Search Ranger project page."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
The easy way (Score:5, Interesting)
Re:The easy way (Score:5, Funny)
Parent
Re:The easiest way (Score:4, Funny)
Parent
another ripoff (Score:4, Funny)
Re:another ripoff (Score:5, Funny)
Parent
Re: (Score:3, Funny)
Re: (Score:2)
Why? (Score:5, Interesting)
Re:Why? (Score:5, Insightful)
Parent
Re:Why? (Score:5, Informative)
Once you've optimised to your keywords in "natural search" e.g. *free* results, then your investment keeps paying (you need to maintain positions of course, but this is lower cost, especially if you're in a niche) whereas in paid advertising you have to keep giving money to Google and, in competitive industries, your cost per click will be subject to significant inflation...
Parent
Re: (Score:2)
Re: (Score:2)
Re:Why? (Score:4, Insightful)
Parent
"time to time"? (Score:5, Insightful)
Time to time? For mee it seems like more than 50% when I scan the search results. Maybe less, maybe more, but certainly more than "time to time". For many of my searches, I may not find anything truly relevant until the second and third page. People have learned how to play Google to the point where more and more Windows Live is starting to give better results (scary!).
Re: (Score:2)
Re:"time to time"? (Score:5, Funny)
Parent
Re: (Score:3, Insightful)
I beg your pardon... "Erotica" is a perfictly legitimate subject.
Ironically (Score:2)
Nice work (Score:5, Informative)
http://research.microsoft.com/SearchRanger/Spam_A
The cloning of popular blogs as been a scourge for a while now, both for manipulating search engines and good old fashioned advertising - using someone else's content to draw visitors in
Nice work (Score:3, Interesting)
Google does keep up, but quietly- anecdotally, last week I was searching for a certain spec ARM9 dev board (the VULCAN-Lite) with USD also as a search term and all kinds of fake keyword sites and eastern block bride services were in the top 20 results.
I sent Google feedback with my search terms (VULCAN-Lite +USD), explained what spam was popping up, and as I write this comment a few days later-- the Google search comes back clean (emp
Re: (Score:2, Interesting)
Then Microsoft realized... (Score:5, Funny)
It's coming from inside the building!!!
How does this help them? (Score:2)
Theories:
(1) There's a subtle way that it helps I haven't spotted yet, perhaps to do with non-PageRank elements of Google's search ordering
(2) This is all done by a very few companies because they are the few that don't understa
Re:How does this help them? (Score:4, Insightful)
Parent
Re: (Score:3, Interesting)
If there's only so much karma going into your pages, there's only so much karma they have to give, no matter how huge it is. A trillion pages pointing at my page won't increase its karma, if those trillion have no karma to give.
Re: (Score:3, Insightful)
The scummiest part of it all is that some of the pages in question will be on domains that someone let expire and someone else immediately snatched up. They get their PageRank from the sites that linked to the formerly legitimate domain. And if that was your domain name, and you only let it expire accidentally, well, sucks to be you. :(
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Every page has to start with some small, intrinsic amount of karma, otherwise there would be none to pass around. By creating enough bogus pages, you can aggregate some amount of link karma to bestow on the site of your choosing. In principle, I guess this would devalue everyone's PageRank too (kind of like printing money), but for a while it could be profitable.
The second hole is the popularity of websites with user-generated content. Lots of highly ranked websites (like /. in fact) allow anyone, or
Re: (Score:2)
There has to be a "root set", but that root set doesn't have to consist of all pages. There's some evidence that it includes all top-level pages, because the Scientologists experimented with creating zillions of top-level domains to increase their Google ranking. But ordinary pages, as I understand it, have no intrinsic karma at all.
Yeah, blog SEO spam is a great evil irritant. I do understand
Re: (Score:2)
That's not how it works. You assume it's a zero-sum game, but it's not. Every page gets some weight even if no one links to it. It's small, but it's positive. When one page links to another, the weight of the source page is reduced less than the target page gains. So, here is the business plan:
1. Make a lot of unique pages (G in the PR calculation joins identical or nearly identical pages)
2. Crosslink them in a n
Re: (Score:2)
That's not the impression I'm under - I thought that most pages were not part of Google's "root set". See my reply here:
http://slashdot.org/comments.pl?sid=227331&thresho ld=1&commentsort=3&mode=thread&pid=18413697#184137 87 [slashdot.org]
Re: (Score:2)
I understand that you have such an impression, but that's a wrong impression. Every page gets a non-zero weight by default. If you think about it you will see that your scheme just would not work: emerging subjects/sites would stay with zero PR for a long long time until links to them propagate all the way to the "roots".
Re: (Score:2)
You're mistaken about your argument against, in any case; PageRank itself is public information, so I can tell you that it does not have the property you assign to it. There's a delay between a link being made and Google spidering and discovering it, but the eigenvector calculation at the heart of PageRank will propogate
Re: (Score:2)
How do you know that?
I can tell you that it does not have the property you assign to it
The delay I mentioned is due to links being made, not links being discovered. Think about some small community of scientists making an almost closed cluster of sites about their niche research subject.
Re: (Score:2)
The delay I mentioned is due to links being made, not links being discovered. Think about some small community of scientists making an almost closed cluster of sites about their niche research subject.
There is simply no way for Google to know that those pages are any good until people start linking to them. Fortunately it doesn't take long - for example, the scientists will get karma from the links from their institution front page
Re: (Score:2)
And? (Score:3, Interesting)
On the other hand, what idiot spouts off about two hosting companies being responsible without naming them? Seriously. This isn't Fark, you can't get kicked off for calling some asshole out.
Re:And? (Score:4, Insightful)
Parent
Re: (Score:2)
so, one down, one to go. Its still a shame the offending company was not named, but I imagine it doesn't exist anymore, wound up and is now reborn as a differently n
And in other news... (Score:4, Funny)
In other news, Microsoft researchers have discovered that the sky is blue and that water is wet.
Re: (Score:3, Funny)
I live in London, you insensetive clod!
Obligatory Bill Hicks (Score:5, Funny)
Bad neighborhoods (Score:3, Interesting)
A few years ago... (Score:4, Interesting)
Microsoftie wearing a white hat? (Score:5, Insightful)
So -- from an admitted open source advocate -- here's a rare kudo to the giant in Redmond for keeping a "white hat" and his group -- and letting them work.
Re: (Score:3, Interesting)
is this research reliable (Score:2)
Firefox is good. (Score:2, Informative)