Spam Sites Infesting Google Search Results 207
The Google Watchdog blog is reporting that "Spam and virus sites infesting the Google SERPs in several categories" and speculates, ...Google's own index has been hacked. The circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?
It's the Rand Corporation (Score:3, Funny)
Google index hacked? (Score:5, Funny)
Re: (Score:2)
Might be interesting to try. But I would hope that they have monitoring in place to spot a sudden surge in alternative translations.
Re: (Score:2)
SEOs (Score:5, Informative)
Using one page of information for Google's spider and then using a redirect for a non-spider user. It's an SEO tactic.
Re:SEOs (Score:5, Interesting)
Re:SEOs (Score:4, Interesting)
That's what makes this scary -- as I said, I thought google was already on the lookout for such scams, and if they're being beat on such a large scale it might mean a major shift in google's strategy is in order...
Re:SEOs (Score:5, Informative)
It's more than likely related to IP address than user agent. I used to work in web site metrics, and the number of fouled up user agents and spoofs was always staggering, but IP was a pretty good indicator of who was doing something. No doubt the bad guys have tracked the Google bot's IP over a long period of time and perhaps made some correlations to give them a pretty good idea if the site is being revisited by Google under an assumed user agent. I'm not sure, but it would seem to me that Google would have thought of spoofing it's IPs long ago, to avoid people being able to track them, though I can't say how you'd go about that.
Re: (Score:2)
Easy: Hire a relatively unknown 3rd party to perform the comparison for you.
Re: (Score:2)
Re: (Score:2)
Perhaps Google should create a browser extension -- completely voluntary, of course -- that essentially turns everybody's browsers into a distributed GoogleBot. Of course then they have to deal with malicious nodes poisoning the data, but that could be resolved by having a dozen or so random systems checking the same website and sending their res
Re: (Score:2)
The fundamental problem with spoofing IPs for this kind of work is that you need to use the right IP to get any data back. You need to have real IPs which are 'disposable'. A botnet, in other words. Google could, if they were evil, create the world's largest botnet by getting JavaScript embedded in search results pages or
Re: (Score:2)
Re: (Score:2)
I'm not sure, but it would seem to me that Google would have thought of spoofing it's IPs long ago, to avoid people being able to track them, though I can't say how you'd go about that.
That's so simple!
Re:SEOs (Score:5, Interesting)
Re:SEOs (Score:5, Insightful)
So medical supply or information websites shouldn't be indexed by Google?
I know what you're trying to do, but no word is 100% inappropriate. What if someone is actually looking for information on Viagra, or replica Swiss watches, or cheap stocks? What if someone is looking for information on spam?
Check for significant differences in content with different user-agents yes, but banned words? That really doesn't seem like a good idea to me.
Re: (Score:2)
What if someone is looking for information on spam?
Which spam would that be:
Re: (Score:2)
Re: (Score:2)
Gary sent the first unsolicited email in 1978 - to about 400 recipients. It was on a topic that - given the size and constitution of the ArpaNet community - could be reasonably assumed of some interest to the "audience".
The famous Green Card Lottery spam was sent to every available Usenet newsgroup. This was quickly termed "Spam" by the Usenet at large - in reference to its ubiquity, like the Spam in the Python sketch ("Bloody vikings!") This was the first wide-scale, flagrant dismissal of
Re:SEOs (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Context should matter, but that didn't stop Beaver College [google.com] from changing their name because of porn/child safety filters.
Re:SEOs (Score:5, Insightful)
It does. It also detects landing pages mentioned above. Apparently it's something more subtle than what one could think of in few mins on Slashdot, and we'll learn soon enough.
Re:SEOs (Score:5, Funny)
It's amazing, really.
Re: (Score:2)
For example, I have a web site that displays the most recent content for returning visitors and the most popular content for visitors who are visiting my site for the very first time. It's also possible for each user to chose which page to see. This is done to increase productivity on the site and to to increase the likelihood of a new visitor becoming a repeat visitor.
When googlebot visits my page I give it the page with the freshest cont
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The theory is good, but the execution would be horribly complicated, and computationally intensive, and have a very high margin for error. (Computers don't intuit flow as well as humans, for a relatively minor example.)
-:sigma.SB
Re: (Score:2)
As for the suggestion of a different user agent, I guess it'd be simple enough to either do a reverse lookup and see if it contains "google"
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Hopefully (don't know if it works), sites like this will give me the correctly indexed information.
Google hacked, sites don't exist, um ... (Score:4, Insightful)
Submitter asks: How did millions of sites get indexed if they don't exist?
Okay, I call this an idiot story. Millions of sites come into being and go out of being all the time. What does this statement have to do with anything? It seems like submitter has a lack of understanding how basic Google and the web work, but the story has made it to Slashdot. I think the Slashdot IQ level is dropping because this is a Digg story.
Re: (Score:3, Informative)
Millions of sites come into being and go out of being all the time. What does this statement have to do with anything? It seems like submitter has a lack of understanding how basic Google and the web work, but the story has made it to Slashdot.
If you had bothered reading the article, you would have seen:
Yes, millions of sites do come into being all the time. Had Google indexed a site, and had said-site disappeared before the index was updated, you would simply either hit a landing page (if that domain was purchased but not set-up) or you would get an error message [carrotsticksareyummy.com]
The submitter was referring to instances when a fake redirector is being set-up and tricking the googlebot by sending it to websites with content and keywords while sending normal use
Re: (Score:2)
[some guy] [scary] At least you should thank it's not a Fark.com story!
Not hosted anywhere? (Score:3, Informative)
That seems a little incredible to me.
Invisible, IPless, Chinese web-servers are taking over Google! Personally, I'll just let Google worry about trying to protect its search engines.
Re:Not hosted anywhere? (Score:5, Interesting)
Where do all the calculators go when they die? (Score:4, Funny)
I'm scared...
Re: (Score:2)
Re: (Score:2)
I'm scared...
Re: (Score:2)
Re:Not hosted anywhere? (Score:5, Funny)
Re: (Score:2)
Re: (Score:2)
> wave function?
The Googlebot is not an "observer".
> That would go against all quantum theory, wouldn't it?
It would "go against" the Copenhagen interpretation.
specific phrases? (Score:5, Interesting)
I'd like to check it out myself.
Drivers (Score:2)
Re: (Score:3, Informative)
Bayesian networks and decision graphs Finn rapidshare
Google's Algorithm (Score:2)
- Sites offering one content to Google and another to users. This is indeed something that Google frowns on, but not something that seems to be in place to be tested by the spider.
- Google's fame comes from their PageRank algorithm and unfortunately people now know how to game the results. If Google were to implement multiple algorithms then users could indicate which search type the wish to use. While it certainly makes thing more complicated for Google, it also makes
Wait and see. (Score:5, Insightful)
Re:Wait and see. (Score:5, Insightful)
Ironic side link (Score:2)
I wonder whether some of the software lets you spam Google's listings easily? Perhaps that's how it was achieved?
Re: (Score:2)
Horrible solution... (Score:2)
Erm... no, bad idea. Maybe google.cn won't have the same spam, maybe it will, but it most certainly is censored for other reasons as well. (Unless they've stopped doing this and I've completely missed the news -- there is one tank man on the first page of a google.cn image search for "tiananmen square", compared with almost the entire first page being tank men on google.com.)
And maybe a good suggestion to
Let me tell you how it happened (Score:2)
Re: (Score:2)
Nutcase conspiracy theory adopters web2.0 version (Score:2, Insightful)
"Some searches (very specific phrases, and I won't list any of them right now - Google knows which they are) return results with a large number of
"The
"[...] the Word-Confirm on all of their sites, including the one I will have to use to post this, generate a large number of rogue responses, and the HELPDESK facilities with thousands of consoles and employees each all over the planet watch the responses and other traffic chara
Sure it's not his browser that's porked? (Score:2, Interesting)
Google is working on this ... (Score:4, Informative)
Simple way to eliminate pharmaceutical spam (Score:3, Funny)
Re: (Score:2)
Re: (Score:2)
If we get universal healthcare in the US, we're going to have to have an age-based cutoff like other countries do. Sorry, no treatment if you are over 70 or something like that. I don't see the AARP and similar groups going for that and they are a pretty substantial voting block.
What hijacked phrases? Not seeing this. (Score:5, Informative)
I'm not seeing any of this. I'm trying commonly spammed phrases in Google, and seeing nothing unusual.
Search Engine Pessimisation (Score:2, Insightful)
I have had an idea for a hack to WordPress, which will make all links invisible to GoogleBot (and maybe the other sear
meta refresh (Score:2)
I read the story with interest as something like this happened to me the other day. It didn't even occur to me that Google had been hacked. I figured the original site had been compromised. A hacked web site can be defaced for shits and giggles, obviously, but it could also have a meta refresh tag added to send the browser off to wherever the defacer wants. With the security hole history of most CMS systems out there, I'm surprised that doesn't happen more often.
It looks like Firefox 3 will allow disab [diveintomark.org]
lot's of dead .cn domains (Score:2)
What is up with images? They being abused too? (Score:3)
I searched on Opel Manta but forgot the space. With it i got many matches very little junk in 1st 10 pages. Without a space i got weird results starting on 1st page. What does a car name have to do with a naked chick with a Nokia phone? Mud wrestlers? Homer Simpson? Paris Hilton? Dozens and dozens of unrelated pictures it seems.
Spyware is off ATM so i didn't get any farther than that.
I call Bullshit!!! (Score:4, Insightful)
Google will adjust, find the method of manipulating the page ranks, and close the hole.
Re:I call Bullshit!!! (Score:5, Insightful)
It may not be a question of a single developer making changes, as much as a single developer (or group of them -- safety in numbers) divulging to certain third parties how the algorithms work in the page ranking system. It's very rare any company gives anyone production access to make changes, but then again I've seen that happen too, where something breaks, they give a developer access to patch it in a hurry before the hew and outcry set in, then forget to revoke his/her access. Of course Google is global, so any change would have to propagate through the system vis source control, so tracking it wouldn't be that hard. I doubt any developer, no matter how nefarious, would take the risk.
Re: (Score:3, Interesting)
Still waiting for them to allow weighting of search terms, though
Re: (Score:3, Insightful)
There are websites strictly devoted to google ranking.
Let me add this about Google. The google corporation really isn't 100% innovative. Their search uses common links to rank. This has led to evolution of the spammers. They load their pages with links to spam. So my point to slashdot is......
If google is so damn loaded with money and that their search tech uses common user l
I Bet It's a Simpler Explanation (Score:5, Interesting)
I imagine that spammers could band together or simply get botnets 'clicking' as independent IP addresses links that boost their page rank. That's how it worked with Bush, they simply linked his homepage as "miserable failure" and suddenly he was the number one result from that query in Google.
I find this more likely an explanation than someone changing the data or values in the database. There's going to be plenty of evidence left in the logs & it's not like nobody's going to notice. This is Google's bread & butter, no amount of money in the world could entice a worker to mess with it. They would have to be exceptionally stupid as the lawsuits that follow would be in the billions.
Re: (Score:3, Informative)
I imagine that spammers could band together or simply get botnets 'clicking' as independent IP addresses links that boost their page rank. That's how it worked with Bush, they simply linked his homepage as "miserable failure" and suddenly he was the number one result from that query in Google.
I like your post, but Google can't detect if you "click" a link. It doesn't need botnets to click links from different IP addresses.
It just needs the mere *presence* of those links, with the same text, to the same page
Re: (Score:2)
Re: (Score:3, Informative)
Re: (Score:3, Insightful)
Re:I Bet It's a Simpler Explanation (Score:4, Interesting)
google-analytics.com (Score:3, Insightful)
Re: (Score:2)
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Clearly the Hitler they're in league with is none other than the vile Space Hitler. Someone call in Good Hitler!
Re: (Score:2)
Re: (Score:2, Insightful)
I spent the better part of a afternoon about 2 weeks ago, submitting my searches to Google asking them too look at these sites.
they were under my key word group and it was driving me nut's.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
It could be an ex-employee (either fired, quit, or possibly a contractor) who's sold the information to some black hats, or it could be any number of other things. There's money to be made by subverting Google's index, so you have to know that there are people working on ways to do so all the time.
Re: (Score:2)
The sites could show one content to Googlebot and another to normal visitors.
Or it could be tricky. Offer the same text/html content, but make part of the content User-Obvious / Bot-Invisible content (images or something thrown together with JavaScript) and downplay or hide the Bot-Obvious content with tricky style sheets or more JavaScript (or just put in a bunch of newlines so it's way down the page). Ultimately it becomes some sort of weird Turing test for Google to be able to detect this sort of stuff.
Re: (Score:2)
http://www.givemebackmygoogle.com/ [givemebackmygoogle.com]
It's not perfect as you can't customise the block list but it's a start. Even better make your own version to run on localhost so you can have your own block list etc.
Re: (Score:2)
Most legitimate sites don't put the code to disable Google's Cache option, but most of the spam sites do for some reason...
This Finding was Validated (Score:3, Interesting)
http://www.pcmag.com/article2/0,1895,2188281,00.asp [pcmag.com]
Also, the Reg noticed - after my Slashdot posting, for once - so they are chasing this tail!
http://www.theregister.co.uk/2007/10/01/google_spam_infiltration/ [theregister.co.uk]
Wheee!