Slashdot Log In
Google Launches Google Sitemaps
Posted by
Zonk
on Fri Jun 03, 2005 09:55 AM
from the please-stop-innovating dept.
from the please-stop-innovating dept.
Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
great interview (Score:5, Informative)
http://blog.searchenginewatch.com/blog/050602-195
More unabashed Google loving... (Score:5, Funny)
I guess the rest of the world has a long way to go to catch up...
Cool idea (Score:5, Interesting)
(http://www.electricstate.com/ | Last Journal: Friday May 05 2006, @03:08PM)
For example, a website we launched a couple months ago is primarily images. We played nice - all of the images have legitimate alt tags, and we tried to let the site degrade properly in older browsers (although you really wouldn't get much, in those instances).
But the biggest problem we had was trying to get the site spidered by Google. It would be, and it would appear in the index, but it would be listed far below sites that linked to it. I don't believe Google likes sites that are primarily images. We populated meta tags with descriptions, but they weren't included; we even tried using hidden text - legitimate, hidden text that would serve as the sites description, but not break the design - but you know how Google feels about those sorts of things. We had to walk a fine line. This'll be nicer.
Re:Cool idea (Score:4, Informative)
(http://www.memwg.com/blog/adsense/ | Last Journal: Thursday April 06 2006, @07:25AM)
Quite right, a new site can be listed in the Google index pretty quickly -- it only took a few days for my latest site to be found by the Googlebot -- but it takes a while before any PageRank gets assigned to its pages, especially if there are no inbound links to the site. No PageRank, no top listing...
EricCurrently at #1 for adsense tips [google.com]
Re:Cool idea (Score:5, Informative)
(http://www.herskal.com/)
Re:Cool idea (Score:4, Informative)
(http://www.tenpastmidnight.com/ | Last Journal: Tuesday June 29 2004, @04:52AM)
It's quite common to be high up for matching terms for about a week, then disappear for three months or so. This seems to be normal behaviour for new sites and is nicknamed the Google sandbox [google.com] and seems to have been confirmed by the patent application recently made public.
The sandbox is just an artificial lowering, so if you're a match for a rare term you can still be found quite easily.
fuckedgoogle.com anyone? (Score:2, Interesting)
http://www.fuckedgoogle.com/ [fuckedgoogle.com]
Stop doing that... (Score:1)
Still in Beta (Score:1)
Sitemaps abuse? (Score:3, Insightful)
(http://iolagnm.com/)
Re:Sitemaps abuse? (Score:4, Informative)
(http://www.arbitraryconstant.com/)
First, the priority is a relative priority, so if you want to set every page to 1.0 (defined as the highest priority) it'll mean nothing.
Second, if you lie about update frequency or the date of the last update they'll figure it out pretty quick.
These aren't commands, they're hints.
Blog related? (Score:1)
(http://www.hoei.com/tech/)
While they are at itmaybe new meta tags? (Score:2)
(http://www.gemstate.net/friends | Last Journal: Tuesday September 11, @10:32AM)
Reinventing the wheel? (Score:2)
Reminds me of blog pings - what's wrong with using the Referer header? Doing some checking and then fetching the referering page and checking for linkage?
Has the world gone XML crazy?
Google is IT's Willy Wonka (Score:5, Funny)
(http://www.themanpages.net/ | Last Journal: Tuesday September 06 2005, @03:45PM)
How does this benefit me? (Score:1)
(http://www.holdemstrategycharts.com/)
Re:How does this benefit me? (Score:5, Insightful)
(http://www.memwg.com/blog/adsense/ | Last Journal: Thursday April 06 2006, @07:25AM)
It benefits you because:
Also, you wouldn't necessarily have to maintain more than one sitemap. You could use XSLT to create the sitemap.html file for your site from the XML file you create for Google. In fact, wouldn't it be nice for Web authoring tools to do this automatically for you?
EricMake Easy Money with Google: The Blog [makeeasymo...google.com] (powered by blojsom [sf.net])
Creative Commons Meme (Score:2, Informative)
(http://www.realmeme.com/roller)
http://www.realmeme.com/miner/preinflection.php?s
Hidden jab at Yahoo? (Score:1)
I love the fact that they're saving us all a lot of time by giving Yahoo! access to this, so we don't have to wait for them to create their own version...
How is this a win-win? Here's how.... (Score:2)
(http://www.onlineconfessional.com/confess | Last Journal: Tuesday June 06 2006, @02:10PM)
Livejournal.com has had a number of problems with Google, and often just plain outright bans them from spidering the site. Part of the problem is that all the registered users have their journals at journalname.livejournal.com as well as livejournal.com\users\journalname. This means indexing the journals for resisted users doubles the load on their server farm!
With something like this, livejournal would be able to define exactly how often the indexing process occurs, and could control which version f the URL is indexes.
I assume issues like this are far from unique.
This is a win-win. Google doesn't have to have it;s spiders crawl sites as often, server load on the various sites is reduced, and indexing frequency is in line with how often the webmaster wants the site to be indexed.
And licensing means that hopefully, the same XML file will be end up being good for multiple search engines!
Very cool technology. Hopefully it's also highly abuse proof. I'd hate to see the results of something like this being used by the "Search engine optimization" firms.
No more messing around with index hacking... (Score:1)
Do you n3ed V1agra or Sialis? We have the best and af728 most potent types fo...
robots.txt (Score:2)
(http://www.isights.org/)
Google Evil Index (Score:5, Funny)
(http://planetretcon.com/)
Lotus Notes? (Score:2)
(The submitter probably meant Lotus Domino, which is still a bad webserver, but not nearly as bad as Notes would be.)
Information already available to Google? (Score:1)
(http://www.randomfield.com/ | Last Journal: Friday November 09, @04:54PM)
great idea (Score:2)
Or maybe another hidden use... (Score:4, Insightful)
Marketplace of Ideas (Score:2)
(http://slashdot.org/~Doc%20Ruby/journal | Last Journal: Thursday March 31 2005, @01:48PM)
And I'm willing to license my idea, "better search engines with better user interfaces", to Google, for a modest sum.
Next thing you know... (Score:1, Interesting)
(http://virtualkarma.blogspot.com/)
This might be marked as troll... but think about it. Isnt it possible?
Re:Next thing you know... (Score:4, Informative)
As long as everyone can freely and voluntarily use these specs without having to pay anything, how is this a bad thing?
Re:Next thing you know... (Score:5, Insightful)
(http://www.mountaininterval.org/)
Nice. Google proposes a way to help web site administrators have a bit more control over how their site is perceived by a search engine, releases this proposal under an open source license, and at least a few people on slashdot accuse them of (*pinky to corner of mouth*) taking over the internet.
Most of Google's recent actions have been good things -- sponsoring open source developers for the summer, proposing ways for site administrators to provide additional info about their site, and implementing a "nofollow" option to prevent spammers trying to increase their page ranking. However, if they constantly get criticized and second-guessed for doing good things, what incentive do they have to continue? If you give a charity $20 and they criticize you for not giving them $30, are ever going to give anything to that charity again?
Let's give Google the benefit of the doubt. Just like a person, they'll probably make some mistakes, but like a person I'll give them the benefit of the doubt until they prove me wrong. Some corporations do actually do good things and still manage to be successful, and in those cases they should be supported, not attacked.
One line (Score:1)
(http://mailvarun.blogspot.com/ | Last Journal: Tuesday August 02 2005, @01:47PM)
what's the basis of the license? (Score:2)
In any case, patented or not, the CC license that this falls under seems acceptable for an open standard, even if it is patented, because it is transferable and because its requirements are minimal. Contrast this with the Microsoft Office XML license, which is royalty-free (for now...), but non-transferable.
Darn it (Score:2)
(http://www.pocketgamer.org/)
Will wait until I get my new server.
What does Creative Commons mean here? (Score:2)
(http://felter.org/wesley/)
More proof that Google isn't Netscape (Score:2)
(http://www.blindmindseye.com/)
The thing that seems so cool about this sort of thing is that it opens up the search service to the rest of us to help us make our content easier to find when it is updated. One thing that I have come to really respect about Google is that they don't rely on the government to beat Microsoft back down the way Netscape did. Google has managed to make a product that 47% of the US Internet users want to use, even though MSN is the default in IE. Remember Netscape 4? There's a reason that bloated POS failed, anyone who remembers the releases of it for the first six months that it went public knows EXACTLY why that was.
The only thing that Google can do at this point is continue to let some of their more biased employees run wild. They've been causing Google's Adsense and Adwords to take extremely partisan stances between the Dems and Reps, and that's gotten the ire of many [blindmindseye.com] on the right. My concern is primarily that Google will end up pissing off so many of these users that they will end up switching to MSN and helping Microsoft take Google down. Google is certainly not perfect, and I'm still wondering why Google News had the National Vanguard, a neo-nazi publication in their news feed list, but says that some of the bigger blogs like Michelle Malkin are not up to editorial snuff. Go figure, like the neo-nazis aren't biased or anything. Then there's their tendency to run ads for Hamas on their arabic pages. [ynetnews.com]
Oh well, in many respects they still have a lot farther to go before they have tried as much evil as Microsoft and they are still more innovative, so time will tell.
search forms (Score:2)
(http://sourcery.blogspot.com/ | Last Journal: Tuesday September 18, @11:53AM)
If you have a bunch of data in a MySQL database, ordinarily Google can't find it. You have to create a static link somewhere with a URL for the search you want to make googlable. Those take maintenance.
There may be some sites that want certain areas crawled, but not others, and those areas aren't maintained by the webmaster or only the top-level part should be hidden from search (which is awkward or impossible to handle with robots.txt). There are always user pages, maverick corporate departments, or whatever.
This offers a way to do all of that in a systematic way. Very nice way to solve several seemingly unrelated problems at once.
Cool idea: Browser utilization of this data! (Score:1)
(http://www.randomworks.com/)
Feeling the heat from google-watch and critics? (Score:2)
(Last Journal: Monday January 23 2006, @12:19PM)
You do have to wonder how much of the 'do no evil' philosophy is cover for the "let us store and index all information about everything, including you" philosophy. Not that I'm going to stop using Google until their results become less usable than Yahoo's results...
Slashdotted? (Score:1)
Google is mightier than slashdot (Score:1)
(http://mailvarun.blogspot.com/ | Last Journal: Tuesday August 02 2005, @01:47PM)
But you cant slashdot google!!!
502 Server Error! (Score:4, Funny)
(http://www.jamesward.org/)
Could be better (Score:2)
(http://pthbb.org/)
What about a robots.txt extension to define the
location of the sitemap index?
Why not just use rss/atom? (Score:3, Insightful)
(http://www.samba-choro.com.br)
What About Us Python-Free Zones? (Score:2)
(http://www.darrenbarefoot.com/)
SiteMaps Generator crashed our server! (Score:1, Informative)
(http://www.xmule.ws/)
I had a very bad experience with python sitemap generator from SourceForge using the 'accesslog' option. I plugged in a 10MB sitelog from our corporate site Great Seats to Sold-out Events [ticketstogo.com], which has ~22,000 pages.
Within five minutes it crashed my development server, a 3200 MHz Pentium 4 with 2GB of RAM running Debian Linux. Just imagine if this had been the production server...the costs for over-utilizng the webserver
For the details, see http://www.incendiary.ws/node/94 [incendiary.ws] Please syndicate my content if you want :-)
Google site map (Score:1)
(http://www.myrealtalk.com/)
http://www.myrealtalk.com/ [myrealtalk.com]
insight into unlinked directories (Score:3, Informative)
(http://www.math.harvard.edu/~knill | Last Journal: Thursday May 29 2003, @08:11PM)
essentially using "find" and "grep" alone, but this tool is much better,
faster and easy to configure. Cool.
Note that this tool will allow google to reach files which never would be
found by spidering a site, because the files are not linked. If you
include something like
<directory path="/var/www/html" url="http://www.example.com/"
in your config.xml and run "sitemap_gen.py" on it, you will give the world
access to a large amount of material
(like test versions of your website or source code you did not want to
make accessible). We might see lot more material material which had been
'hidden'.
Ignore section for WordPress (Score:1)
That would be a lot of non-essential crud for Google to spider.
Using filters is very simple, so the following filters removed most rubbish:
<filter action="drop" type="wildcard" pattern="*index.htm*" >
<filter action="drop" type="wildcard" pattern="*awstats*" >
<filter action="drop" type="wildcard" pattern="*wp-admin*" >
<filter action="drop" type="wildcard" pattern="*wp-includes*" >
<filter action="drop" type="wildcard" pattern="*wp-content*" >
<filter action="drop" type="wildcard" pattern="*wp-images*" >
Failure to find sites (Score:2)
(http://www.hawknest.com/ | Last Journal: Tuesday October 05 2004, @04:11PM)
Re:IIS? (Score:2)
Remember that MS doesn't have a monopoly on web servers, so they can't be dicks about it like they can with everything else.
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:3, Insightful)
(http://www.mants-lair.org.uk/)
Well, maybe if Google stop doing stuff for a while?
Lots of slashdotters seem interested in what Google does, either becuase it tends to be neat, or so they can worry about privacy and the info Google potentially has access to.
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:1)
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:1)
(http://www.gaijingamers.com/)
The day Slashdot starts editing its news output to appease the petty whining of individuals who don't like seeing anyone get positive press, will be a sad day for us all.
Google isn't perfect, but they're doing a lot of pretty good things right now. We're all ready to jump on companies when they screw up, so why shouldn't we give them credit when its due.
Re:IIS? (Score:1)
(http://www.nam37.com/)
Re:Google Bitching (Score:1)
(http://searchthefreakingweb.com/)
Why should Google be different than any other topic?
Re:SHUT UP SHUT UP SHUT UP (Score:2)
(http://www.ausedcar.com/ | Last Journal: Monday August 22 2005, @10:29PM)
Re:Google Bitching (Score:2)
You are right.
No other company has ever launched an Internet Search function.
No other company has ever launched web-based email.
No other company has ever provided online maps.
No other company has ever offerd the contents of usenet via the web.
No other company has ever offered navigable satalite photos of the planet.
No other company has ever offerd realtime webcaching and compression to "speed up" one's access.
No company has ever cached websites for access when they are down or no longer available.
No company has ever offered a price-checking website.
Oh, hang on, wait a minute...
When you look at it, I mean actually take a step back and LOOK, Google is a highly derivative company, with not much in the way of true innovation.
They take existing ideas and functions, and tweak them. Coupled with their "geek coolness" and hero-worship, they are simply riding the hype wave.
Re:Google Bitching (Score:2)
(http://www.ausedcar.com/ | Last Journal: Monday August 22 2005, @10:29PM)
This is such bullshit. Some of the stuff they put out is very cool and newsworthy like Google Maps, Gmail, etc... But so much crap that is either not ready yet, not unique, or just plain boring gets posted here. Its literally a direct feed of the Google blog half of the time.
It also annoys people like me how people on slashdot treat Google like its the second coming of Christ. If anyone says anything negative they get bombarded with posts saying they suck or their ideas are crazy, and any critism of google is given either a "ITS IN BETA!!!" or "YOUR A MICROSOFT ASTROTURFER!!!" and people also waste space giving google a handjob for an idea that has already existed for years. The personalized google portal and the google satellite pictures are two examples that come to the top of my head. Both of these things had been done by other sites for years and then google comes out with them and in the case of satellite pics did not improve upon the existing sites out there and in the case of the portal made an inferior product. Yet when Yahoo or MSN come out with a product that is an attempt at improving something existing, those same people say "COPY CATS!". It also doesn't help that the customized Google Portal only allowed you to add two news sites and one of them was slashdot. Furthermore many Google people post here, and they mention slashdot in the Google blog, so I think another thing that annoys me is there is a large amount of suspected Google astroturfing here.
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)
(http://www.imaginaryrobots.net/)
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)
Re:WE BROKE GOOGLE! Woohoo! (Score:2)
Re:ermm (Score:1)
(http://www.mattbaron.net/)
(Now I'm waiting for my 19 seconds to be up)
Re:IIS? (Score:2)