Become a fan of Slashdot on Facebook

Google Launches Google Sitemaps 223

Posted by Zonk on Friday June 03, 2005 @10:55AM from the please-stop-innovating dept.

Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."

This discussion has been archived. No new comments can be posted.

Google Launches Google Sitemaps

Load All Comments

Search 223 Comments Log In/Create an Account

Comments Filter:

great interview (Score:5, Informative)

by professorhojo ( 686761 ) * writes: on Friday June 03, 2005 @10:56AM (#12713975)

for more crunchy detail, here's a great Q&A interview i found with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps:

http://blog.searchenginewatch.com/blog/050602-1952 24 [searchenginewatch.com]

Share
twitter facebook
More unabashed Google loving... (Score:5, Funny)

by sachmet ( 10423 ) writes: on Friday June 03, 2005 @10:56AM (#12713976)

Everyone else defines a protocol. But apparently Google defines protocools.

I guess the rest of the world has a long way to go to catch up...

Share
twitter facebook
- Search Engine (Score:3, Funny)
  
  by Pac ( 9516 ) writes:
  
  [To ELP's "Lucky Man"]
  
  They had white pages
  And hits by the score
  All the people's queries
  Waiting by the door
  
  Ooooh, what a search engine it was
  Ooooh, what a search engine it was
  
  Many geeks and hackers
  They made up its core
  Everybody's dearest
  A daily stop for more
  
  Ooooh, what a search engine it was
  Ooooh, what a search engine it was
  
  It went to the market
  Of the engines it was king
  Of his honor and his glory
  Slashdot would sing
  
  Ooooh, what a search engine it was
  Ooooh, what a search engine it was
  
  A burst had found it
  - Re:Search Engine (Score:2)
    
    by Dr Tall ( 685787 ) writes:
    
    I find your lack of faith disturbing.
  - Re:Search Engine (Score:2)
    
    by daviddennis ( 10926 ) writes:
    
    You know, that was pretty clever, but a hint as to how this burst happened would be helpful.
    
    Especially since Google really does have a great idea here. I know Slashdotters on the whole love Google, and I know there's a bit of a backlash, but for the sake of the integrity of the argument, let's have that backlash be for some legitimate reason, not just because Google's too popular because, well, it really is great.
    
    D
    - Re:Search Engine (Score:2)
      
      by Pac ( 9516 ) writes:
      
      Please, I was just making a quick joke - no predictions, nothing so serious. I love Google dearly (I have even installed the Accelerator for a day or two until it bothered me with that "23 minutes saved" message) and I think it is popular because of its merits and the hard work of its people, not because they got lucky or something. But even great companies can eventually disappear for one reason or another.
Cool idea (Score:5, Interesting)

by aftk2 ( 556992 ) writes: on Friday June 03, 2005 @11:00AM (#12714006) Homepage Journal

This is a cool idea, because I've often wondered about being able to "talk" to search engines at a slightly higher level than robots.txt allows.

For example, a website we launched a couple months ago is primarily images. We played nice - all of the images have legitimate alt tags, and we tried to let the site degrade properly in older browsers (although you really wouldn't get much, in those instances).

But the biggest problem we had was trying to get the site spidered by Google. It would be, and it would appear in the index, but it would be listed far below sites that linked to it. I don't believe Google likes sites that are primarily images. We populated meta tags with descriptions, but they weren't included; we even tried using hidden text - legitimate, hidden text that would serve as the sites description, but not break the design - but you know how Google feels about those sorts of things. We had to walk a fine line. This'll be nicer.

Share
twitter facebook
- Re:Cool idea (Score:2, Interesting)
  
  by RealityMogul ( 663835 ) writes:
  
  I think Google doesn't like NEW sites. I run a high school alumni website, and it was at least 6 months before you could type in the title of the homepage (which was "[Town nobody has heard of] Alumni") into Google and have it listed in the top 10. Once it did start appearing in the top ten, it was still below sites that linked to it. Most of the higher results simply had "Alumni" in them and nothing with the town name. After about 9 months, my site now has the #1 slot for that search string.
  - Re:Cool idea (Score:4, Informative)
    
    by Eric Giguere ( 42863 ) writes: on Friday June 03, 2005 @11:12AM (#12714118) Homepage Journal
    
    Quite right, a new site can be listed in the Google index pretty quickly -- it only took a few days for my latest site to be found by the Googlebot -- but it takes a while before any PageRank gets assigned to its pages, especially if there are no inbound links to the site. No PageRank, no top listing...
    Eric
    Currently at #1 for adsense tips [google.com]
    
    Parent Share
    twitter facebook
  - Re:Cool idea (Score:5, Informative)
    
    by rehannan ( 98364 ) writes: on Friday June 03, 2005 @11:25AM (#12714229) Homepage
    
    I just put a new site online. About 4 or 5 days after submitting it to google, it was the number one hit when searching for the title of the site.
    
    Parent Share
    twitter facebook
    - Re:Cool idea (Score:2)
      
      by KillerDeathRobot ( 818062 ) writes:
      
      That's pretty strange, because Google definitely has a sandbox that they keep sites in for 6-8 months.
      
      Maybe you had few competitors and those competitors (for the search result) were also new.
      - Re:Cool idea (Score:2)
        
        by DrSkwid ( 118965 ) writes:
        
        I've had PageRank after 3 weeks on new sites
      - Re:Cool idea (Score:2)
        
        by Eric Giguere ( 42863 ) writes:
        
        It depends on how many pages there are that match those keywords. If your title is unique enough, then sure, your site will show up first. But as soon as there's contention for the keywords, don't expect to stay up top.
        Eric
        View your HTTP headers here [ericgiguere.com]
    - Re:Cool idea (Score:4, Informative)
      
      by singleantler ( 212067 ) writes: on Friday June 03, 2005 @12:16PM (#12714690) Homepage Journal
      
      It's quite common to be high up for matching terms for about a week, then disappear for three months or so. This seems to be normal behaviour for new sites and is nicknamed the Google sandbox [google.com] and seems to have been confirmed by the patent application recently made public.
      
      The sandbox is just an artificial lowering, so if you're a match for a rare term you can still be found quite easily.
      
      Parent Share
      twitter facebook
    - Re:Cool idea (Score:2)
      
      by IGnatius T Foobar ( 4328 ) writes:
      
      I just put a new site online. About 4 or 5 days after submitting it to google, it was the number one hit when searching for the title of the site.
      
      So you're the one who came up with "DISCREET ONLINE PHARMACY" ?? :)
      
      Seriously though, if there aren't a lot of other sites containing your title, that's easy. If you're one among a dozen or so, not so easy.
    - Re:Cool idea (Score:3, Informative)
      
      by mgbaron ( 457884 ) writes:
      
      I think I can shed a little light on this situation as I have had both of the above cases happen to me.
      
      This is how the system works. Google can index your site very quickly (within a couple of days), if you have an incoming link or submit to their crawler. If your site is well keyword optimized for a fairly rare keyword, it is entirely plausible that it would come up number one fairly quickly.
      
      What takes a long time is for google to update their pagerank index. This is where your site will sit in the Go
  - Re:Cool idea (Score:2)
    
    by caluml ( 551744 ) writes:
    
    I'm still waiting for my site, Calum [calum.org] to get indexed. The bots come regularly, but nothing in there. If people could just paste the following link onto their pages, Calum [calum.org], I'm sure everything would be right with Calum [calum.org] and Google. I'm sure Google doesn't hate Calum [calum.org], and that there is just some misunderstanding.
    :)
    - Re:Cool idea (Score:2)
      
      by daviddennis ( 10926 ) writes:
      
      From what I understand, Google ignores links placed on Slashdot comment pages for exactly this reason :-(.
      
      Sorry.
      
      D
      - Re:Cool idea (Score:2)
        
        by caluml ( 551744 ) writes:
        
        Really? (It was intended as a joke btw). Can Google really be that analytical that it has a list of forums that people can post to?
- Re:Cool idea (they stole my idea?) (Score:2, Interesting)
  
  by neanderlander ( 637187 ) writes:
  
  On february 16th i sent google the following email to suggestions@google.com: Hi,
  This is a suggestion for the people who take care of indexing web sites.
  Because Google is the first search engine of choice it has enough of influence to point noses into the same direction.
  So, i propose a new element to be added to websites: a sitemap file. Similar to the favicon file, every site could have an (xml?) file containing information about the info and the info-topography on the site.
  Google has already a 'si
fuckedgoogle.com anyone? (Score:2, Interesting)

by Anonymous Coward writes:

Had to say it:

http://www.fuckedgoogle.com/ [fuckedgoogle.com]
- - Re:fuckedgoogle.com anyone? (Score:2)
    
    by grandmofftarkin ( 49366 ) * writes:
    
    On my initial glance it appears to be safe for work. Apart from the word fucked in the URL there is nothing really bad.
Sitemaps abuse? (Score:3, Insightful)

by iolagnm ( 645827 ) writes: <iolagnm AT gmail DOT com> on Friday June 03, 2005 @11:03AM (#12714033) Homepage

It will take a company with enough influence like Google to really promote XML sitemaps, which could lead to a great thing... but what is to stop them from becoming like MetaTags where companies will just flood them with useless keywords and entries in an attempt to get better search rankings?

Share
twitter facebook
- Re:Sitemaps abuse? (Score:2)
  
  by Sancho ( 17056 ) writes:
  
  I'd really like to see a site-influenced system like this that defines areas of news and areas of non-news. I'm tired of searching for multiple terms and getting main articles devoted to one of the terms and sidebar links to one of the others. For example, [insert notebook model] and Linux.. you might get a site like Slashdot where there's an article about the new notebook and many, many sidebar items about Linux.
  - Re:Sitemaps abuse? (Score:3, Interesting)
    
    by Jellybob ( 597204 ) writes:
    
    Using XHTML this shouldn't be too hard - something along the lines of:
    
    <goog:index> Stuff that actually matters </goog:index> Advertising crap which people don't care about.
    
    It's not going to fix the problem on sites which are doing this delibrately, but for those of us who actually care about getting indexed relevantly it would be great.
- Re:Sitemaps abuse? (Score:2)
  
  by Mant ( 578427 ) writes:
  
  I've not seen anything to suggest sitemaps will improve your ranking, just get you indexed more often.
  
  If you claim pages update every day, but they don't, it will be pretty easy for the spider to tell. So you could stop the frequent scans if they aren't really needed, if after say a month the supposed daily updates never happened.
- Re:Sitemaps abuse? (Score:2, Interesting)
  
  by drnlm ( 533500 ) writes:
  
  That's really up to the search engine implementation, isn't it.
  Anyway, a brief look at the proposed format gives very little scope for abuse - you can specify location, change frequency, last modified and a priority, and that's it. The priority is specified as only applying to urls from the same site, so what you can do with it is fairly limited. Overall, it looks written as a set of additional hints to spiders crawling the site.
- Re:Sitemaps abuse? (Score:4, Informative)
  
  by ArbitraryConstant ( 763964 ) writes: on Friday June 03, 2005 @11:54AM (#12714488) Homepage
  
  Well, I noticed two things about it...
  
  First, the priority is a relative priority, so if you want to set every page to 1.0 (defined as the highest priority) it'll mean nothing.
  
  Second, if you lie about update frequency or the date of the last update they'll figure it out pretty quick.
  
  These aren't commands, they're hints.
  
  Parent Share
  twitter facebook
While they are at itmaybe new meta tags? (Score:2)

by LWATCDR ( 28044 ) writes:

I would love to see a new meta tag for address to become common. Could make things like Google local even more useful.
- Re:While they are at itmaybe new meta tags? (Score:2)
  
  by Enrico Pulatzo ( 536675 ) writes:
  
  I'd look into using RSS (Really Simple Syndication) with DC (Dublin Core) metadata.
  
  I think the "coverage" tag would be probably what you're looking for.
  - Re:While they are at itmaybe new meta tags? (Score:2)
    
    by LWATCDR ( 28044 ) writes:
    
    Okay.. Anyone use it?
    Most static sites do not use RSS.
    - Re:While they are at itmaybe new meta tags? (Score:2)
      
      by Enrico Pulatzo ( 536675 ) writes:
      
      "Okay..Anyone use it?"
      Sure. Lots of people use it. Does Google grok it? I dunno.
      
      "Most static sites..."
      So. Be a trend setter. I encourage you to not think in terms in static and dynamic but in terms of modern and outdated.
      
      Do modern search engines even care about meta tags anymore? It seems that technorati-style tags may be a more modern equivalent, but they do tend to influence the way you display information on your site. There's a lot of discussion on whether or not these tags are primed to spiral
      - Re:While they are at itmaybe new meta tags? (Score:2)
        
        by LWATCDR ( 28044 ) writes:
        
        If Google local does not use them then they are useless.
        I have to admit that I wonder how a location tag could be abused.
        modern vs static?
        Not every page needs to be dynamic. A page for a restaurant does not tend any dynamic content so RSS seems like over kill.
Reinventing the wheel? (Score:2)

by baadger ( 764884 ) writes:

Ermm this is all well and good and such but isn't a large chunk of this information already made available via Cache-Control and Last-Modified HTTP headers?

Reminds me of blog pings - what's wrong with using the Referer header? Doing some checking and then fetching the referering page and checking for linkage?

Has the world gone XML crazy?
- Has the world gone XML crazy? (Score:2)
  
  by aug24 ( 38229 ) writes:
  
  Actually, no, I don't think it has. Precisely as you observe, only a large chunk is available. Now the fact that the vanilla aspects you mention can already be acheived is not a good enough reason to avoid implementing some kind of value-added extensible version of anything that is useful. This is the net evolving to serve humans better, right in front of us.
  Just think of this sort of thing as inter-linking web services sitting on top of the http protocol.
  
  Justin.
  - Re:Has the world gone XML crazy? (Score:2)
    
    by baadger ( 764884 ) writes:
    
    But as it stands this XML Sitemap index doesn't provide any new information that HTTP headers don't (assuming dynamic pages update handle them well) except for the priority weighting...which should be derived from update frequency.
    
    I don't see how centralising all this header information serves webmasters better. Only Google.
- Re:Reinventing the wheel? (Score:2)
  
  by iabervon ( 1971 ) writes:
  
  Cache-Control only works on a per-request basis, and Last-Modified only works if you decide to check again. They're designed for clients like web browsers, where you only care about whether there have been changes when the user is checking on the site; they're not good for trying to schedule spidering, because many things specify "no-cache" (if the user wants to look at the page, just get a new one) and doing HEAD requests on the whole web for the Last-Modified dates is going to be slow.
  - Re:Reinventing the wheel? (Score:2)
    
    by baadger ( 764884 ) writes:
    
    "Cache-Control only works on a per-request basis"
    
    I believe proxies cache the headers as well, unless must-revalidate is specified in which case it must do a If-Modifed-Since or similar request which will return fresh headers. How is it not Google's responsibility to remember when to crawl your page anyway? Thats exactly what they intend to do.
    
    "They're designed for clients like web browsers, where you only care about whether there have been changes when the user is checking on the site"
    
    Why is Google any d
Google is IT's Willy Wonka (Score:5, Funny)

by stlhawkeye ( 868951 ) writes: on Friday June 03, 2005 @11:07AM (#12714067) Homepage Journal

I envision the interior of Google as this huge warehouse full of oversized transistors, data streams with paddleboats, waterfalls of caffeinated beer, chairs contoured like a keyboard key, where diminutive men in green hair sing songs about electrons and logic gates and if you wander into the room where Duke Nukem 3D is being tested you'll be thrown out.

Share
twitter facebook
- Re:Google is IT's Willy Wonka (Score:2)
  
  by novakreo ( 598689 ) writes:
  
  if you wander into the room where Duke Nukem 3D is being tested you'll be thrown out.
  
  I think you mean Duke Nukem Forever.
  Duke Nukem 3D is nearly ten years old, I remember playing it at high school on my Pentium-100 laptop.
- Re:Google is IT's Willy Wonka (Score:3, Informative)
  
  by Wee ( 17189 ) writes:
  
  I never saw any paddleboats, but they did have a keg of beer outside the cafe yesterday. And there's no shortage of caffeinated drinks in the mini-kitchens.
  I can neither confirm nor deny the existence of any secret video game testing rooms.
  
  -B
- Re:Google is IT's Willy Wonka (Score:2)
  
  by ajs ( 35943 ) writes:
  
  Yeah, you've mostly got the description of the public tour, but if you step off the boat and go searching around, you'll find a room with this 3-story-tall slug, spewing out search results from it's back-side! It's a disturbing site, but I still can't get myself to stop using Google!
Creative Commons Meme (Score:2, Informative)

by broward ( 416376 ) writes:

It's not surprising that Google is using a Creative Commons license. The meme has been steadily gaining strength for over a year.

http://www.realmeme.com/miner/preinflection.php?st artup=/miner/preinflection/creativecommonscontentD ejanews.png [realmeme.com]
How is this a win-win? Here's how.... (Score:2)

by doublem ( 118724 ) writes:

This sounds like a really cool idea.

Livejournal.com has had a number of problems with Google, and often just plain outright bans them from spidering the site. Part of the problem is that all the registered users have their journals at journalname.livejournal.com as well as livejournal.com\users\journalname. This means indexing the journals for resisted users doubles the load on their server farm!

With something like this, livejournal would be able to define exactly how often the indexing process occurs,
- Re:How is this a win-win? Here's how.... (Score:2)
  
  by Sancho ( 17056 ) writes:
  
  Seems to me that a better solution would be EITHER disallowing indexing of the registered users ljname.livejournal.com pages OR disallowing everything BUT ljname.livejournal.com, granting more benefit for registration.
  - Re:How is this a win-win? Here's how.... (Score:2)
    
    by doublem ( 118724 ) writes:
    
    Last I heard they had blocked ALL Google indexing. robots.txt is somewhat restrictive.
- - Re:How is this a win-win? Here's how.... (Score:2)
    
    by jrumney ( 197329 ) writes:
    
    <AOL>Me too</AOL>. I host a couple of sites on my home ADSL line, and my usage is about 6GB/month, mostly MSN, Google and Yahoo's crawlers indexing and reindexing the same pages over and over. MSN especially I would like to slow down.
robots.txt (Score:2)

by shmlco ( 594907 ) writes:

It's too bad they couldn't use figure out a way to add addtional keywords to robots.txt. (w/o breaking it) Now one needs to create both files for a site to index properly.
- Re:robots.txt (Score:2)
  
  by Seanasy ( 21730 ) writes:
  
  Google wants this sitemap funtionality to make into the web server itself. So, it looks like they're opting for the long-term solution.
Google Evil Index (Score:5, Funny)

by yotto ( 590067 ) writes: on Friday June 03, 2005 @11:11AM (#12714108) Homepage

In other news, the Google Evil Index went down 3.2 points today, and is currently at 13.8, the lowest it's been since right before the beta rollout of Google Web Accelerator.

Share
twitter facebook
Lotus Notes? (Score:2)

by Blakey Rat ( 99501 ) writes:

Somebody's using Lotus Notes as a webserver? May God have mercy on their souls.

(The submitter probably meant Lotus Domino, which is still a bad webserver, but not nearly as bad as Notes would be.)
- Re:Lotus Notes? (Score:2)
  
  by circusboy ( 580130 ) writes:
  
  They may have really meant 'notes,' this has been seen in the wild... the person who I knew who worked there lamented about it quite a lot. but it is done... sad to say.
great idea (Score:2)

by utexaspunk ( 527541 ) writes:

If people use this, it will likely remove much redundancy from google's indexing processes, possibly freeing up bandwidth and processing power in their datacenters for other projects like more web-based applications...
Or maybe another hidden use... (Score:4, Insightful)

by 823723423 ( 826403 ) writes: on Friday June 03, 2005 @11:22AM (#12714201)

Navigation is sometimes the hardest part on the internet. A tree structure is sometimes the second easiest way of searching/browsing for information (1st being keyword searching). So maybe if more web designers set up server side solutions, it will lower the burden on web designers. More importantly, move navigation away from web designers to users just as Google displaced content from web designers unto Searchers. So instead of overburdening web servers like this Firefox extension Firefox extension with screenshot [extensionsmirror.nl] which automatically generates a sitemap br crawling a site. Sites can access a sitemap using a favicon.ico like or link rel="sitemap.rdf or sitemap.xml" protocol. Just as netscape NAVIGATOR originally proposed a while back. I think web designers should pay attention - at least those that don't use flash for their whole site. The web is slowly become a database of content rather than style. See the webmonkey wired article on netscape sitemap feature Sitemap rdf [wired.com] or the sitemap slide here Slide from seminar [ukoln.ac.uk]

Share
twitter facebook
Marketplace of Ideas (Score:2)

by Doc Ruby ( 173196 ) writes:

"Google is licensing the idea under the Attribution/Share Alike Creative Commons license. "

And I'm willing to license my idea, "better search engines with better user interfaces", to Google, for a modest sum.
what's the basis of the license? (Score:2)

by cahiha ( 873942 ) writes:

I'm wondering: why do you need a license to implement this? Did Google patent this?

In any case, patented or not, the CC license that this falls under seems acceptable for an open standard, even if it is patented, because it is transferable and because its requirements are minimal. Contrast this with the Microsoft Office XML license, which is royalty-free (for now...), but non-transferable.
Darn it (Score:2)

by David Horn ( 772985 ) writes:

It needs Python 2.2, and I only have 1.5 running. Unfortuately, so many things depend on it (*cough* Ensim *cough*) that attempting to upgrade is a death wish.

Will wait until I get my new server. :)
- Re:Darn it (Score:2)
  
  by ArbitraryConstant ( 763964 ) writes:
  
  Multiple versions of Python can coexist on a machine...
What does Creative Commons mean here? (Score:2)

by Wesley Felter ( 138342 ) writes:

An idea cannot be copyrighted, and thus cannot be licensed under a copyright license like Creative Commons. File formats, being facts, shouldn't be copyrightable either. If the text of the spec is licensed as Attribution-ShareAlike, then all this allows is people to fork the spec, causing confusion.
- - Re:What does Creative Commons mean here? (Score:2)
    
    by Wesley Felter ( 138342 ) writes:
    
    ...Unisys's (now expired) .gif patent...
    
    There was never a patent on GIF; Unisys had a patent on the LZW algorithm which could be used in GIFs. Uncompressed GIFs were not covered by the patent.
    
    But if you look at a simple XML format like Google Sitemaps, there is no novel algorithm involved in reading or writing the format and thus no basis for patent.
More proof that Google isn't Netscape (Score:2)

by ShatteredDream ( 636520 ) writes:

The thing that seems so cool about this sort of thing is that it opens up the search service to the rest of us to help us make our content easier to find when it is updated. One thing that I have come to really respect about Google is that they don't rely on the government to beat Microsoft back down the way Netscape did. Google has managed to make a product that 47% of the US Internet users want to use, even though MSN is the default in IE. Remember Netscape 4? There's a reason that bloated POS failed, any
search forms (Score:2)

by RealProgrammer ( 723725 ) writes:

any site where certain pages are only accessible via a search form would benefit from creating a Sitemap and submitting it to search engines.
If you have a bunch of data in a MySQL database, ordinarily Google can't find it. You have to create a static link somewhere with a URL for the search you want to make googlable. Those take maintenance.

There may be some sites that want certain areas crawled, but not others, and those areas aren't maintained by the webmaster or only the top-level part should be h
Feeling the heat from google-watch and critics? (Score:2)

by javaxman ( 705658 ) writes:

Someone alerted me to google watch [google-watch.org] the other day. It's definitely an interesting take on the company, I have to say.
You do have to wonder how much of the 'do no evil' philosophy is cover for the "let us store and index all information about everything, including you" philosophy. Not that I'm going to stop using Google until their results become less usable than Yahoo's results...
- - google watch watch ? wow. (Score:2)
    
    by javaxman ( 705658 ) writes:
    
    But who watchers the watchers?
    That's a thing of beauty. [google-watch-watch.org] Well, not really, it's a damn shame to waste a domain name on a nearly plain-text page, but it's still pretty funny. Does anyone really love google enough to host a page like that on their own? Wow, if so. I mean, I've always liked google, but would I rent out a domain to host a anti-anti-google website? I doubt it. Thanks for that, though. Definitely a +1 interesting from an AC.
502 Server Error! (Score:4, Funny)

by md17 ( 68506 ) writes: <james@@@jamesward...org> on Friday June 03, 2005 @12:21PM (#12714733) Homepage

OMG!!! We finally /.'d Google!

Share
twitter facebook
Could be better (Score:2)

by belg4mit ( 152620 ) writes:

Instead of having to notify search engines (blech)
What about a robots.txt extension to define the
location of the sitemap index?
Why not just use rss/atom? (Score:3, Insightful)

by neves ( 324086 ) writes: on Friday June 03, 2005 @01:00PM (#12715378) Homepage

My rss feeds already publishes my newest/freshest pages. Why did they didn't just extended it with some aditional attributes/tags instead of forcing me to implement another xml format?

Share
twitter facebook
- - Re:Why not just use rss/atom? (Score:3, Interesting)
    
    by neves ( 324086 ) writes:
    
    Silly me! Just found in their FAQ: you can use RSS/atom [google.com] as your sitemap format!
What About Us Python-Free Zones? (Score:2)

by Zastrossi ( 603991 ) writes:

I see an execution gap here, though. My blog is, what, 2600 pages? I'm obviously not going to build that XML file manually (with one node for each page). Google does provide a Sitemap Generator, but it's Python code meant to be run on my web server. My Python skills are nil, so that route isn't viable for me either. I expect that there's a good many 'webmasters' (as in, people who design and run websites) who don't know Python from perl. Given the CC license, though, maybe somebody will grab the code and b
insight into unlinked directories (Score:3, Informative)

by e**(i pi)-1 ( 462311 ) writes: on Friday June 03, 2005 @02:47PM (#12716546) Homepage Journal

I had been writing a primitive sitemap generator myself using shellscripts essentially using "find" and "grep" alone, but this tool is much better, faster and easy to configure. Cool. Note that this tool will allow google to reach files which never would be found by spidering a site, because the files are not linked. If you include something like <directory path="/var/www/html" url="http://www.example.com/" /> in your config.xml and run "sitemap_gen.py" on it, you will give the world access to a large amount of material (like test versions of your website or source code you did not want to make accessible). We might see lot more material material which had been 'hidden'.

Share
twitter facebook
- Re:IIS? (Score:2)
  
  by rpozz ( 249652 ) writes:
  
  It appears that you simply have an xml file on your webserver, and you point google at it. Nothing special and certainly possible with IIS.
  
  Remember that MS doesn't have a monopoly on web servers, so they can't be dicks about it like they can with everything else.
- Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:3, Insightful)
  
  by Mant ( 578427 ) writes:
  
  Well, maybe if Google stop doing stuff for a while?
  
  Lots of slashdotters seem interested in what Google does, either becuase it tends to be neat, or so they can worry about privacy and the info Google potentially has access to.
- Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)
  
  by Doctor Crumb ( 737936 ) writes:
  
  New web technology is certainly "Stuff that matters" to webmasters. I am happy that inventions like this are covered on my primary geek news site.
  - Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)
    
    by Momoru ( 837801 ) writes:
    
    This is hardly new web technology...its not something like BitTorrent or VOIP or anything like that that is a truely innovative cool technology.
    
    This is just a idea that allows a corporation to do its job better. It would be like if the Census Bureau asked everyone to mail a list of all of the members of their household to them, and update it when they have kids, so they didn't have to take the time to count door to door.
- Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)
  
  by EastCoastSurfer ( 310758 ) writes:
  
  Funny you should say that. I was just doing some research this morning to figure out when I want to short their stock.
- Re:Still in Beta (Score:2)
  
  by Mant ( 578427 ) writes:
  
  Just about all of Google seems to be in beta. While it is nice to get the stuff early, "beta" is a pretty meaningless term as far as Google stuff is concerned.
- Re:Still in Beta (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  Google never has a final release, they just leave everything in beta, forever - see groups.google.com, maps.google.com, gmail.google.com, froogle.google.com, news.google.com, and who knows what else.
- Re:How does this benefit me? (Score:5, Insightful)
  
  by Eric Giguere ( 42863 ) writes: on Friday June 03, 2005 @11:20AM (#12714186) Homepage Journal
  It benefits you because:
  
  Google will hopefully crawl your frequently-changing pages more often
  
  Conversly, Google won't crawl other pages as often, saving your bandwith
  
  Google will find pages that it wouldn't normally find just by following links
  
  Also, you wouldn't necessarily have to maintain more than one sitemap. You could use XSLT to create the sitemap.html file for your site from the XML file you create for Google. In fact, wouldn't it be nice for Web authoring tools to do this automatically for you?
  Eric
  Make Easy Money with Google: The Blog [makeeasymo...google.com] (powered by blojsom [sf.net])
  Parent Share
  twitter facebook
  - Eh? (Score:2)
    
    by baadger ( 764884 ) writes:
    Google should index my website as often as possible. It should use algorithms to detect update frequency and content type and assign it's own indexing priorities to meet the needs of people who are actually searching for information and present them with a fresh result set. Caching mechanisms do this all the time - the Last-Modified and Cache-Control headers tell you how likely it is content is to be updated and how often.
    
    It won't save bandwidth if the algorithms mentioned above work correctly. HEAD reque
    - Re:Eh? (Score:2)
      
      by Eric Giguere ( 42863 ) writes:
      
      You're right, but I bet a lot (most?) sites don't do it right and Google figures this is the next best way... Easier to ask people to put up sitemaps than tell them to fix their pages/servers.
      Eric
      Why the Vioxx recall reduced spam [ericgiguere.com] (humor)
- Re:How does this benefit me? (Score:3, Interesting)
  
  by DigitalRaptor ( 815681 ) writes:
  
  Because when you launch a new site, or new section of your site, you create the site map and notify Google, rather than hoping some day they'll follow a link somewhere and come spider your site.
  
  Google immediately knows that the site exists, immediately knows how many pages there are, how often they are supposed to change, AND what priority I place on them, so out of my 150 pages, the 10 I want spidered first are labeled as higher priority.
  
  This makes total sense to me.
- - Re:SHUT UP SHUT UP SHUT UP (Score:2)
    
    by Momoru ( 837801 ) writes:
    
    And your contributions to the discussion are so helpful too.
- Re:Google Bitching (Score:2)
  
  by ColdGrits ( 204506 ) writes:
  
  "Like it or not Google is an inovative company."
  
  You are right.
  
  No other company has ever launched an Internet Search function.
  No other company has ever launched web-based email.
  No other company has ever provided online maps.
  No other company has ever offerd the contents of usenet via the web.
  No other company has ever offered navigable satalite photos of the planet.
  No other company has ever offerd realtime webcaching and compression to "speed up" one's access.
  No company has ever cached websites for access wh
  - Re:Google Bitching (Score:2)
    
    by Fareq ( 688769 ) writes:
    
    Google creates innovative technologies and technological implementations of other people's models.
    
    Frequently the google implementation is vastly superior in some purely technological manner. In this way they are innovative.
    
    Frequently, google is able to turn the superior technology into a superior user experience as well. In this way, too, they are innovative.
    
    Frequently, google's cool creations are hyped beyond all belief. In this manner, they are... erm... the recipients of a geek love-affair, and not
- Re:Google Bitching (Score:2)
  
  by Momoru ( 837801 ) writes:
  
  Every single product they put out is slashdot worthy.
  
  This is such bullshit. Some of the stuff they put out is very cool and newsworthy like Google Maps, Gmail, etc... But so much crap that is either not ready yet, not unique, or just plain boring gets posted here. Its literally a direct feed of the Google blog half of the time.
  
  It also annoys people like me how people on slashdot treat Google like its the second coming of Christ. If anyone says anything negative they get bombarded with posts saying
- Re:Next thing you know... (Score:2)
  
  by Seanasy ( 21730 ) writes:
  
  This might be marked as troll...
  Which is a pretty good clue that it is a troll.
  but think about it. Isnt it possible?
  No, no it isn't possible.
- Re:Next thing you know... (Score:2)
  
  by Jellybob ( 597204 ) writes:
  
  No, it's not really possible.
  
  If Google starts doing crap like that, designers and devlopers aren't going to go along with it, at which point Google's usefulness drops dramatically, and all their users go to their competitors.
  - Re:Next thing you know... (Score:2)
    
    by Momoru ( 837801 ) writes:
    
    What motivation would designers have to NOT do it? If Google is the most popular search engine, and you want to keep being searched, you have to keep up with what they want. Its like just because I may not like how MS embeds its browser in the OS, and its anticompetitive practices doesn't mean my company can stop producing software for Windows (and make a profit).
- It's possible, but then... (Score:2)
  
  by ShatteredDream ( 636520 ) writes:
  
  It's also possible that Google's CEO could go on a murderous rampage tomorrow at Microsoft's Redmond campus. 0.000000000001% is still a possibility you know. Do you realize what would happen to Google if they did that? They'd be dumped by most website owners faster than they could count the drop in their search and ad hits.
  
  Then again, Google coming up with detailed design guidelines for their pages for public consumption would be incredibly useful for designers. They use a lot of cutting edge JS tricks lik
- Re:Next thing you know... (Score:4, Informative)
  
  by TuringTest ( 533084 ) writes: on Friday June 03, 2005 @12:14PM (#12714675) Journal
  
  And the next thing you know will be Google launching specs on web design and then content.
  
  As long as everyone can freely and voluntarily use these specs without having to pay anything, how is this a bad thing?
  
  Parent Share
  twitter facebook
- Re:Next thing you know... (Score:2)
  
  by caluml ( 551744 ) writes:
  
  I've often thought that Apache could do similar things. If after there was a big security hole found in Apache, they put out a non-vulnerable version that broke IE browsing, but didn't affect Firefox, it could force a lot of people over. Of course, it would have to be done carefully like other companies do it - The Opera/MSN debarcle [opera.com] so people didn't notice.
  
  And what is this about?:
  Slashdot requires you to wait 2 minutes between each successful posting of a comment to allow everyone a fair chance at posti
- Re:Next thing you know... (Score:2)
  
  by That's Unpossible! ( 722232 ) * writes:
  
  And the next thing you know will be Google launching specs on web design and then content. Who will comply? well.. anybody who wishes to be indexed by Google. That is 100% of the website owners. And thus Google will control the design, content and other things... HELP... they are taking over the internet
  
  This might be marked as troll... but think about it. Isnt it possible?
  
  No. Because Google needs our pages indexed by its robots more than we need Google to index us.
  
  Sonny, let me tell you 'bout a time be
- Re:Next thing you know... (Score:2)
  
  by Nimey ( 114278 ) writes:
  
  More like paranoid ranting. This was only modded up because of the current "Google is an evil coporation" slashthink.
- Re:Next thing you know... (Score:5, Insightful)
  
  by phidipides ( 59938 ) writes: on Friday June 03, 2005 @12:34PM (#12715002) Homepage
  
  And thus Google will control the design, content and other things... HELP... they are taking over the internet
  
  Nice. Google proposes a way to help web site administrators have a bit more control over how their site is perceived by a search engine, releases this proposal under an open source license, and at least a few people on slashdot accuse them of (*pinky to corner of mouth*) taking over the internet.
  
  Most of Google's recent actions have been good things -- sponsoring open source developers for the summer, proposing ways for site administrators to provide additional info about their site, and implementing a "nofollow" option to prevent spammers trying to increase their page ranking. However, if they constantly get criticized and second-guessed for doing good things, what incentive do they have to continue? If you give a charity $20 and they criticize you for not giving them $30, are ever going to give anything to that charity again?
  
  Let's give Google the benefit of the doubt. Just like a person, they'll probably make some mistakes, but like a person I'll give them the benefit of the doubt until they prove me wrong. Some corporations do actually do good things and still manage to be successful, and in those cases they should be supported, not attacked.
  
  Parent Share
  twitter facebook
- Re:Cool idea: Browser utilization of this data! (Score:2)
  
  by shmlco ( 594907 ) writes:
  
  That would be nice, if google had made the map hierarchical, which they didn't, and if they allowed for directories, which they don't seem to have done, and if people included every page in their site in the map, which they also don't have to do.
- Re:Google is mightier than slashdot (Score:2, Informative)
  
  by Gnascher ( 645346 ) writes:
  
  Au contraire ... Google is returning a 502 on the provided link. Slashdot killed Google. Too bad too ... I wanted to read about this stuff...
- Re:Slashdotted? (Score:2)
  
  by lux55 ( 532736 ) writes:
  
  Is there a Google cache of it? Ahhh, circular references!!!!!
  
  Their blog is holding up at least.
- Re:Information already available to Google? (Score:2)
  
  by ArbitraryConstant ( 763964 ) writes:
  
  It seems there's two goals:
  
  -Get preferences from admins (priority, approximate update frequency).
  -Get metadata (time of last update)
  
  They can tell most of that just by downloading the page regularly but with 8 billion pages it's probably pretty hard to do every one of them with any frequency and most of them probably change a few times a year if that.
  
  Now they can tell if a page has changed and whether it's likely to change in the future with a few kb of gzipped xml instead of megabytes of HTML.
  
  They've op
- Re:WE BROKE GOOGLE! Woohoo! (Score:2)
  
  by BillsPetMonkey ( 654200 ) writes:
  
  What's more interesting is that if you do just that - try again 30 seconds later ... it's still broken.
- Python is Cool (Score:2)
  
  by codepunk ( 167897 ) writes:
  
  nuff said

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

great interview (Score:5, Informative)

More unabashed Google loving... (Score:5, Funny)

Search Engine (Score:3, Funny)

Re:Search Engine (Score:2)

Re:Search Engine (Score:2)

Re:Search Engine (Score:2)

Cool idea (Score:5, Interesting)

Re:Cool idea (Score:2, Interesting)

Re:Cool idea (Score:4, Informative)

Re:Cool idea (Score:5, Informative)

Re:Cool idea (Score:2)

Re:Cool idea (Score:2)

Re:Cool idea (Score:2)

Re:Cool idea (Score:4, Informative)

Re:Cool idea (Score:2)

Re:Cool idea (Score:3, Informative)

Re:Cool idea (Score:2)

Re:Cool idea (Score:2)

Re:Cool idea (Score:2)

Re:Cool idea (they stole my idea?) (Score:2, Interesting)

fuckedgoogle.com anyone? (Score:2, Interesting)

Re:fuckedgoogle.com anyone? (Score:2)

Sitemaps abuse? (Score:3, Insightful)

Re:Sitemaps abuse? (Score:2)

Re:Sitemaps abuse? (Score:3, Interesting)

Re:Sitemaps abuse? (Score:2)

Re:Sitemaps abuse? (Score:2, Interesting)

Re:Sitemaps abuse? (Score:4, Informative)

While they are at itmaybe new meta tags? (Score:2)

Re:While they are at itmaybe new meta tags? (Score:2)

Re:While they are at itmaybe new meta tags? (Score:2)

Re:While they are at itmaybe new meta tags? (Score:2)

Re:While they are at itmaybe new meta tags? (Score:2)

Reinventing the wheel? (Score:2)

Has the world gone XML crazy? (Score:2)

Re:Has the world gone XML crazy? (Score:2)

Re:Reinventing the wheel? (Score:2)

Re:Reinventing the wheel? (Score:2)

Google is IT's Willy Wonka (Score:5, Funny)

Re:Google is IT's Willy Wonka (Score:2)

Re:Google is IT's Willy Wonka (Score:3, Informative)

Re:Google is IT's Willy Wonka (Score:2)

Creative Commons Meme (Score:2, Informative)

How is this a win-win? Here's how.... (Score:2)

Re:How is this a win-win? Here's how.... (Score:2)

Re:How is this a win-win? Here's how.... (Score:2)

Re:How is this a win-win? Here's how.... (Score:2)

robots.txt (Score:2)

Re:robots.txt (Score:2)

Google Evil Index (Score:5, Funny)

Lotus Notes? (Score:2)

Re:Lotus Notes? (Score:2)

great idea (Score:2)

Or maybe another hidden use... (Score:4, Insightful)

Marketplace of Ideas (Score:2)

what's the basis of the license? (Score:2)

Darn it (Score:2)

Re:Darn it (Score:2)

What does Creative Commons mean here? (Score:2)

Re:What does Creative Commons mean here? (Score:2)

More proof that Google isn't Netscape (Score:2)

search forms (Score:2)

Feeling the heat from google-watch and critics? (Score:2)

google watch watch ? wow. (Score:2)

502 Server Error! (Score:4, Funny)

Could be better (Score:2)

Why not just use rss/atom? (Score:3, Insightful)

Re:Why not just use rss/atom? (Score:3, Interesting)

What About Us Python-Free Zones? (Score:2)

insight into unlinked directories (Score:3, Informative)

Re:IIS? (Score:2)

Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:3, Insightful)

Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)

Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)

Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:2)

Re:Still in Beta (Score:2)

Re:Still in Beta (Score:2)

Re:How does this benefit me? (Score:5, Insightful)

Eh? (Score:2)

Re:Eh? (Score:2)