Google's Technology Explored

Slashdot is powered by your submissions, so send in your scoop

Google's Technology Explored 294

Posted by Zonk on Thursday March 03, 2005 @01:51PM from the googling-google dept.

RobotWisdom writes "Internetnews offers a moderately detailed peek at Google's technology. For example, they use stripped-down Red Hat on a massively redundant network, and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page." Additional analysis on InformationWeek and C|Net. From the article: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."

This discussion has been archived. No new comments can be posted.

Google's Technology Explored

Load All Comments

Search 294 Comments Log In/Create an Account

Comments Filter:

PigeonRank(TM) (Score:5, Funny)

by Kimos ( 859729 ) writes: <`moc.liamg' `ta' `todhsals.somik'> on Thursday March 03, 2005 @01:53PM (#11835807) Homepage

That's now how google does it! This is their REAL secret:
http://www.google.com/technology/pigeonrank.html [google.com]

Share
twitter facebook
- Re:PigeonRank(TM) (Score:5, Funny)
  
  by Tackhead ( 54550 ) writes: on Thursday March 03, 2005 @01:58PM (#11835847)
  
  > That's now how google does it! This is their REAL secret: http://www.google.com/technology/pigeonrank.html
  That was pre-IPO.
  We'd like you to meet Bubba [dailymail.co.uk]. Bubba's fully vested, and as this article [dailymail.co.uk] says, he's, uh... he's grown somewhat.
  
  Parent Share
  twitter facebook
  - Not Bubba! (Score:2)
    
    by antdude ( 79039 ) writes:
    
    His name is Hercule. :P
- Re:PigeonRank(TM) (Score:5, Funny)
  
  by eric_brissette ( 778634 ) writes: on Thursday March 03, 2005 @01:58PM (#11835848)
  
  Their technology for waste management alone must be revolutionary.
  
  Parent Share
  twitter facebook
- Google Lunar (Score:4, Funny)
  
  by Barryke ( 772876 ) writes: on Thursday March 03, 2005 @02:05PM (#11835930) Homepage
  
  They're hiring.
  http://www.google.com/jobs/lunar_job.html [google.com]
  a snippet:
  
  Google Copernicus Center is hiring
  Google is interviewing candidates for engineering positions at our lunar hosting and research center, opening late in the spring of 2007. This unique opportunity is available only to highly-qualified individuals who are willing to relocate for an extended period of time, are in top physical condition and are capable of surviving with limited access to such modern conveniences as soy low-fat lattes, The Sopranos and a steady supply of oxygen.
  
  Parent Share
  twitter facebook
/. effect (Score:4, Funny)

by Anonymous Coward writes: on Thursday March 03, 2005 @01:55PM (#11835820)

If we could /. google, that would impress me

Share
twitter facebook
- Re:/. effect (Score:5, Interesting)
  
  by SmokeHalo ( 783772 ) writes: on Thursday March 03, 2005 @02:01PM (#11835889)
  
  It's been tried. From TFA:
  
  One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.
  
  Parent Share
  twitter facebook
  - Re:/. effect (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    The undisclosed location was Santa Clara. I won't get more specific than that, sorry. They had a room jam packed with gear that was improperly cabled and spaced, and they didn't want to pay for redundant cooling. Then again, it wasn't a production site. Someone was almost overcome by the heat just walking between rows of cabinets.
  - Re:/. effect (Score:3, Funny)
    
    by Dolly_Llama ( 267016 ) writes:
    
    a datacenter in an undisclosed location
    
    Is Dick Cheney in the IT business now?
- considering.... (Score:3, Insightful)
  
  by WindBourne ( 631190 ) writes:
  
  that the virus which used google could not do it with 10's of thousand of computers, it is not likely that /. can do it.
- - Re:/. effect (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    Perl is a great language, and I love it, but that does not mean that you have to use it for everything.
    
    while true; do wget www.google.com; done
    
    seems better to me.
    - Re:/. effect (Score:5, Funny)
      
      by Anonymous Coward writes: on Thursday March 03, 2005 @02:16PM (#11836074)
      
      Computer programming languages are great, and I love them, but that does not mean that you have to use them for everything
      
      open browser at www.google.com
      get a drinking duck thing that bobs up and down hitting F5 every second
      
      seems better to me.
      
      Parent Share
      twitter facebook
      - Re:/. effect (Score:2)
        
        by NoOneInParticular ( 221808 ) writes:
        
        Better still, use the avian carrier protocol [faqs.org] to transmit packets to google. If you select carriers attractive enough, I'm sure it will distort google's search technology [google.com].
        
        Re:/. effect (Score:2)
        
        by NoOneInParticular ( 221808 ) writes:
        
        yeah, that might work as well.
Truly Amazing. (Score:5, Interesting)

by iibbmm ( 723967 ) writes: on Thursday March 03, 2005 @01:55PM (#11835822)

It really is amazing to think of the amount of information and data that we can access so quickly these days. When I stop and think about what my little search query goes through to bring me an almost instant response, it almost seems impossible. Of course the search engine side of this is only one example, but it's a nifty insight into how powerfull our infrastructure is these days. Bravo, mankind.

Share
twitter facebook
- Also Amazing: How much we miss (Score:5, Interesting)
  
  by Ieshan ( 409693 ) writes: <ieshan@g[ ]l.com ['mai' in gap]> on Thursday March 03, 2005 @02:01PM (#11835886) Homepage Journal
  
  It's also amazing how much of the general knowledge of the world we *can't* access, because it's unconnected or unpublished.
  
  Just think about how vast and extensive Google's search is, and then think about how little of the World's knowledge and creative achievement it actually can access.
  
  The quantity and breadth of human knowledge is breathtaking, no?
  
  Parent Share
  twitter facebook
  - Re:Also Amazing: How much we miss (Score:5, Insightful)
    
    by iibbmm ( 723967 ) writes: on Thursday March 03, 2005 @02:05PM (#11835932)
    
    That's why projects like wikipedia are so important, and so impressive.
    
    Only a few years ago it could take forever to find any kind of decent information on some topics online or even in libraries. Today, I go to wiki and I'm almost assured to have a FAIRLY reliable source for information, as it's cross checked by peers who have some kind of a personal interest in the subject.
    
    However, there's a downside.
    
    Back when I was in school, researching a subject typically meant going through encyclopedia after encyclopedia, which wasn't a bad thing. I learned quite a bit by being FORCED to over-research topics. Today, I can generally straight-shoot to whatever I need to find, giving my brain a good set of blinders to everything else along the way.
    
    Parent Share
    twitter facebook
    - Re:Also Amazing: How much we miss (Score:3, Interesting)
      
      by Skim123 ( 3322 ) writes:
      
      Also with computers there's the whole cut and paste thing... at least with a printed encyclopedia you had to read the content when writing your report.
      Technology has the ability to improve everyone's collective IQ, but also has the ability to dumb down the populace. Kind of like TV. I remember tutoring an elementary student when I was a high school student back in '95 or so, and he couldn't do simple math (addition, subtraction, etc.) without his calculator. Sad...
    - - Re:Also Amazing: How much we miss (Score:3, Insightful)
        
        by Jugalator ( 259273 ) writes:
        
        > FAIRLY reliable source for information
        
        That's the problem. It isn't reliable. For example, one local journalist got burned badly by using that piece of crap to do research during the election.
        
        Correction: It's "often" reliable.
        
        You want a better source?
        
        Sorry, you won't find one. Not a single one at least.
        
        What you're speaking of is not a problem with Wikipedia, that's a problem with a journalist who doesn't know how to properly research a subject. If a journalist relies on any single source to be per
  - Re:Also Amazing: How much we miss (Score:3, Funny)
    
    by garcia ( 6573 ) * writes:
    
    Oh come now! You can always do a site:slashdot.org and search Google. All the knowledge about ANYTHING is right there at your fingertips. Sometimes in duplicate and triplicate!
    
    What more could you need?
  - Re:Also Amazing: How much we miss (Score:2, Insightful)
    
    by natedubbya ( 645990 ) writes:
    
    The quantity and breadth of human knowledge is breathtaking, no?
    Well, I think you haven't studied enough if you think this. When you start to realize we actually know very little, then you're getting somewhere.
  - Re:Also Amazing: How much we miss (Score:3, Interesting)
    
    by jon787 ( 512497 ) writes:
    
    Not only that, but all the information we index and then can't retrieve!
    
    "We have an embarrassment of riches in that we're able to store more than we can access. Capacities continue to double each year, while access times are improving at 10 percent per year. So, we have a vastly larger storage pool, with a relatively narrow pipeline into it." -- Jim Gray, Microsoft Research.
  - Re:Also Amazing: How much we miss (Score:3, Interesting)
    
    by Kazoo the Clown ( 644526 ) writes:
    
    I think it might be pretty amazing to find out what we can't easily access, even that which is published on the net. A simple example: you can't differentiate "net" from ".net" on google, and net is an extremely common word so it is next to useless as a qualifier if your searching for info on the ".net" equivalent to anything common. Or try searching for the smiley face: ":-)". While those may be trivial and uninteresting specific examples, they illustrate at least one area where "you can't find it throu
- Laziness, ignorance or (Score:2, Insightful)
  
  by sporty ( 27564 ) writes:
  
  I think the only reason other companies don't do as well as google is due to either laziness or ignorance to some basic things and some advanced things. An index is not the most ground breaking thing in the world. Job delegation and breaking up work is not that ground breaking either. Clustering has been around in concept since forever. Now I ask you, the public, not just you iibbmm, how many applications have you done that use these concepts? Most biz concepts are very simple. They don't try to impl
  - Re:Laziness, ignorance or (Score:5, Insightful)
    
    by Kashif Shaikh ( 575991 ) writes: on Thursday March 03, 2005 @04:19PM (#11837400)
    
    None of the concepts of computer science are new, but what is ground breaking is Google touching all aspects of computer science to solve a problem. Distributed Databases, Replicated Filesystems, Clustering, Learning algorithms, job scheduling, map/reduce languages, etc. are not new. But they applied each of these sub-domains to 'searching' and 'lots of data'. Using old ideas is _new_ ways is ground breaking. That what everyone does(like Carmack and DOOM3).
    
    Parent Share
    twitter facebook
  - Re:Laziness, ignorance or (Score:3, Insightful)
    
    by akirchhoff ( 95640 ) writes:
    
    In my experience, you can add, "don't want to pay for". Some of the places I have worked for aren't lazy, ignorant of the possibilities; they have made a deliberate decision to work cheap. They will accept the downtime from a quick and dirty design, rather than pay for better design. It's all in the numbers, how much will we lose if we are down.
- - Re:Truly Amazing. (Score:2)
    
    by HD Webdev ( 247266 ) writes:
    
    Brings up an interesting point... Last night we were offered "free" Guinness pint glasses with each Guinness purchased. We were told they were 20oz glasses. Pints aren't 20oz we said.
    
    I was in a resteraunt and my girlfriend ordered a small orange juice while I ordered a large one.
    
    She drank hers rather quickly and for some reason, I thought the glasses looked odd even though mine was larger. So, I poured my large orange juice in her empty glass, and (go figure) it fit.
    
    Even after demonstrating this
  - That's an English pint, you yob. (Score:2)
    
    by aristus ( 779174 ) writes:
    
    Bloody Imperial, not your wimpy pints.
interesting (Score:3, Funny)

by slapout ( 93640 ) writes: on Thursday March 03, 2005 @01:56PM (#11835834)

and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page

So that's why I can search on the result page for my orginally query and find nothing. And all this time I was blaming Internet Explorer!

Share
twitter facebook
- Re:interesting (Score:3, Interesting)
  
  by InfiniteWisdom ( 530090 ) writes:
  
  What's interesting is that the notice "Google is not affiliated with the authors of this page nor responsible for its content." goes away when you look at the cache of Google.com! That's a change from the last time I looked at Google's cache of Google a couple of years or so ago.
Comment removed (Score:5, Funny)

by account_deleted ( 4530225 ) writes: on Thursday March 03, 2005 @02:00PM (#11835880)

Comment removed based on user account deletion

Share
twitter facebook
- Re:Whats really impressive (Score:5, Funny)
  
  by MillionthMonkey ( 240664 ) writes: on Thursday March 03, 2005 @02:33PM (#11836259)
  
  I don't see what's astounding about this.
  
  Reminds me of a radio interview I once heard with the Google founders. The host was curious about what the "I'm feeling lucky!" button was about. She claimed she typed in "Google" into the search box and clicked "I'm feeling lucky!", and nothing happened, so it didn't work!
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - did for me (Score:3, Informative)
    
    by goon ( 2774 ) writes:
    ... The "I'm Feeling LuckyTM" button automatically takes you to the first web page returned for your query.
    
    An "I'm Feeling Lucky" search means less time searching for web pages and more time looking at them. ...
    
    from the "I'm Feeling LuckyTM" button [google.com]. Guess they changed it.
- Re:Whats really impressive (Score:2)
  
  by Rufus88 ( 748752 ) writes:
  
  What if Google decided to cache only those sites that don't cache themselves. Would google cache itself then?
  - - Re:Whats really impressive (Score:2)
      
      by Rufus88 ( 748752 ) writes:
      
      You may not find it interesting, but it's closely related to Russell's Paradox [stanford.edu], which was of serious concern to set-theoreticians. This, in turn, is closely related to Godel's Incompleteness Theorem and also to the Halting Problem [netaxs.com], which place fundamental limits on mathematics and computability.
      - Re:Whats really impressive (Score:3, Interesting)
        
        by lgw ( 121541 ) writes:
        
        I've done a more studying in that area than most. There has been a lot of over-reacting to paradoxes such as this. Godel's Incompleteness theorem is only narrowly interesting: as soon as you start talking about physical things, these paradoxes are much less imporant.
        
        A set which contains all sets which do not contain themselves may be a conundrum, but a catalog that lists all catalogs that do not list themselves is merely impossible (trivially impossible, in fact). There are plenty of things that can be
Picked up a Microsoftie (Score:2, Informative)

by solomonrex ( 848655 ) writes:

This article explained to me why they would pick up a Microsoft guy who worked on NT. Yes, I'm sure Google's OS and NT have nothing in common, but all the same, this guy seems motivated and smart. And if they have their own custom OS, I'm sure they're not going to make their own distribution, they just need to work in house.

http://news.yahoo.com/news?tmpl=story&u=/zd/200 5 03 03/tc_zd/146950

blog:

http://mark-lucovsky.blogspot.com/2005/02/shippi ng -software.html
Meltdown? (Score:4, Interesting)

by Ironsides ( 739422 ) writes: on Thursday March 03, 2005 @02:01PM (#11835885) Homepage Journal

Google's redundancy theory works on a meta level, as well, according to Hoelzle. One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

Gee.. I wish our /.ing could do this. On the other hand, they have a level of redundancy and up time many businesses would kill for.

Share
twitter facebook
- Re:Meltdown? (Score:4, Funny)
  
  by Ignignot ( 782335 ) writes: on Thursday March 03, 2005 @02:22PM (#11836156) Journal
  
  One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.
  
  Gee.. I wish our /.ing could do this.
  
  It is my belief that data center fires are caused by slashdot every day!
  
  Parent Share
  twitter facebook
More useless search results? (Score:4, Insightful)

by SerialEx13 ( 605554 ) writes: on Thursday March 03, 2005 @02:01PM (#11835897)

so that pages can match even if none of the words in your query actually appear on the page.

Even pages that come up in my search results now that contain my query don't even have anything to do with what I am looking for. Isn't this just adding to the problem?

How about a Did you mean? option that doesn't compare against spelling, but related topics instead?

Share
twitter facebook
- Re:More useless search results? (Score:5, Informative)
  
  by InfiniteWisdom ( 530090 ) writes: on Thursday March 03, 2005 @02:18PM (#11836096) Homepage
  
  It says they're using clustering, so it might help eliminate pages that contain the words you're looking for but aren't relevant to your current query, in addition to including pages that are relevant but don't contain the words. For example,
  
  the word "tree" may either refer to a data structure (binary, B-,red-black etc.) or to the stuff forests are made of. If my query is "search tree", the words search and tree may show up on a page about people searching for some kind of a tree and on pages about search trees. Assuming they're both popular classes of pages, you're going to end up with some mishmash of results from both classes.
  
  Instead, the clustering algorithm might notice (based on other words that appear on the pages, for example) that pages with 'search' and 'tree' in them fall into two classes. That doesn't help if "search tree" is all it has to go by. But now if I add the words "data structure" to the query, it knows which class of pages I'm interested in, because many pages about binary trees contain the words "data structure" whereas almost none about the quest for trees do. Now it can return pages from the right cluester that it knows are relevant, even if they don't contain the word "data structure" in them.
  
  Parent Share
  twitter facebook
no AND needed (Score:5, Interesting)

by tehshen ( 794722 ) writes: <tehshen@gmail.com> on Thursday March 03, 2005 @02:01PM (#11835898)

From the summary:

they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

From the help guide [google.co.uk]:

By default, Google only returns pages that include all of your search terms.

Which of these is correct? If it's the summary, is there any way to turn this behaviour off? I find it immensely annoying.

Share
twitter facebook
- Re:no AND needed (Score:5, Informative)
  
  by Ironsides ( 739422 ) writes: on Thursday March 03, 2005 @02:09PM (#11835979) Homepage Journal
  
  they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.
  
  I think what they mean is that they are working on search algorithms that will implement this. Not that they have already made it publicly available. They want it to work first, and be released second. The problem the you have cropping up most likely occurs with pages that put info in the metadata, and hence don't show up in the page itself.
  
  Parent Share
  twitter facebook
  - Re:no AND needed (Score:3, Insightful)
    
    by M00TP01NT ( 596278 ) writes:
    
    I don't know if this is what TFA was getting at, but in a google cache page you may from time to time see the phrase "These terms only appear in links pointing to this page: ...".
    
    For example, try searching for "miserable failure" on Google. The first result is George Bush's biography on www.whitehouse.gov.
    
    However, the term "miserable failure" doesn't actually show up (yet) in the biography. But, pages that POINT to the biography do include those terms.
    
    As a result, pages can match your search quer
- - Re:no AND needed (Score:2)
    
    by tehshen ( 794722 ) writes:
    
    Appending "+" just ensures the word is seen, as Google rejects 'I' and 'from' etc. It doesn't affect things like this [google.com].
    
    "allintext:" before everything works. Thanks to a helpful AC.
Oops (Score:5, Funny)

by Daedala ( 819156 ) writes: on Thursday March 03, 2005 @02:02PM (#11835904)

Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cooking" is a good match even though it contains none of the query words.

One word: cooking.

I'm sure the principle is sound. I just think the example is a leetle bit flawed.

Share
twitter facebook
- Re:Oops (Score:2)
  
  by swimmar132 ( 302744 ) writes:
  
  You're missing a few other connections it needs.
  
  1) The connection between "courses" and "class".
  
  2) The connection between "Bay Area" and "Berkeley".
- Re:Oops (Score:4, Interesting)
  
  by ahem ( 174666 ) writes: on Thursday March 03, 2005 @02:56PM (#11836500) Homepage Journal
  
  The actual quote from the article that I saw was:
  The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words.
  FYI.
  
  Parent Share
  twitter facebook
  - Re:Oops (Score:3, Interesting)
    
    by Daedala ( 819156 ) writes:
    
    Hmm. It must have been corrected; I did a direct copy/paste for my quote.
Too celver for their own good? (Score:3, Insightful)

by Mirk ( 184717 ) writes: <slashdotNO@SPAMmiketaylor.org.uk> on Thursday March 03, 2005 @02:03PM (#11835910) Homepage

From the article summary:

They're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

I hate that. Don't you hate that? When you type in a search keyword, isn't it because you want that keyword to appear in the documents you find?

This "find tangentially related documents" feature will be fine so long as they make it optional and set it to be off by default. Otherwise, I don't want their idea of what pages I should be looking at polluting my results list.

I call "innovation for the sake of innovation".

Share
twitter facebook
- "Celver"? Did I say "celver"? (Score:2)
  
  by Mirk ( 184717 ) writes:
  
  Looks like I was too dmub for my own good.
  - Re:"Celver"? Did I say "celver"? (Score:2)
    
    by tehshen ( 794722 ) writes:
    
    You should have used Google [google.com] ;)
- Re:Too celver for their own good? (Score:3, Insightful)
  
  by mythosaz ( 572040 ) writes:
  
  The entire point of a search engine like Google is that they do give you their idea of what pages your query should return.
  That's how it works...
- Re:Too celver for their own good? (Score:2)
  
  by Halo- ( 175936 ) writes:
  
  Well, I agree that if Google starts ranking these derived results more highly than ones which actually contain the search terms it could be annoying. If I do a search for say "Roofer Austin" and there are no roofers in Austin found, I'd rather have something related like "Contractors Central Texas" returned than nothing at all. If for no other reason that it helps me reword my search.
  I do agree it should be an option though. Google (in my opinion) has been pretty good about not being obtrusive, so I su
Yeah, I noticed that (Score:4, Informative)

by rde ( 17364 ) * writes: on Thursday March 03, 2005 @02:03PM (#11835914)

I've been putting movie reviews on my web page for a while now, and I've increasingly noticed that google will point people at them even though they search for stuff that isn't on the page. For example, I've had a number of hits where people search for 'AvP review' (or suchlike) and even though I never include the phrase 'AvP' in my review [robertelliott.org] of Aliens vs Predator.

I was mightily impressed, and not just because it means more people read my stuff. Or at least surf to it.

Share
twitter facebook
- Re:Yeah, I noticed that (Score:3, Informative)
  
  by CdBee ( 742846 ) writes:
  
  I bet you'll find someones linked to you and put the phrase AvP in the link. Google references that as well...
- Re:Yeah, I noticed that (Score:2, Interesting)
  
  by generic-man ( 33649 ) writes:
  
  Try doing a search for a Macintosh software product [google.com]. Even though "Mac OS X" was not one of your search terms, Google boldfaces it as though it were!
  
  I can't reproduce this with another term. I wonder whether this was a manual fix by Google programmers.
  - Re:Yeah, I noticed that (Score:2)
    
    by arkanes ( 521690 ) writes:
    
    Ahem. No it doesn't. It bolds "macintosh" and "opera".
Video about some of the backend stuff (Score:5, Interesting)

by otisg ( 92803 ) writes: on Thursday March 03, 2005 @02:04PM (#11835922) Homepage Journal

Here it is, from one of the Google guys:
Google: A Behind-the-Scenes Look [uwtv.org].

Share
twitter facebook
- Re:Video about some of the backend stuff (Score:2)
  
  by otisg ( 92803 ) writes:
  
  Before /. effect kills that server, consider using Coral (www.coralcdn.org): here [nyud.net].
  - Re:Video about some of the backend stuff (Score:2)
    
    by adpowers ( 153922 ) writes:
    
    Somehow I don't think that is going to happen. Also, I don't think Coral works with streaming video (which this is). That said, I'd recommend people watch this video. I attended the lecture and it was really interesting. If you read a lot about Google and are observant, not much is new, but it is all put together nicely. However, you do get to see some interesting idea clustering stuff they have in their backend. I had never seen this before. Also, Jeffrey Dean is a funny guy.
- Re:Video about some of the backend stuff (Score:2)
  
  by ad0gg ( 594412 ) writes:
  
  I find it quite funny they are doing a power point presentation with screen shots of IE.
Want the dean.pdf without a USENIX account? (Score:2, Informative)

by robmandu ( 196790 ) writes:

http://www.cs.wisc.edu/~dusseau/Classes/CS739/Pape rs/dean.pdf [wisc.edu]
Question... (Score:4, Interesting)

by kryogen1x ( 838672 ) writes: on Thursday March 03, 2005 @02:07PM (#11835958)

Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.

Do they share these patches with everyone else?

Share
twitter facebook
- Re:Question... (Score:2, Insightful)
  
  by TreeHead ( 553584 ) writes:
  
  ;i was wondering the same thing. do modifications of this sort fall under the GPL? if so, isn't google required to share them with the public [gnu.org], or are "patches" not considered "modifications" to the software?
  
  ;treehead
  - Re:Question... (Score:2)
    
    by Jussi K. Kojootti ( 646145 ) writes:
    
    Did you read the paragraph you linked to? As long as they're just using the modifications themselves they are under no obligation to publish them.
  - Re:Question... (Score:5, Informative)
    
    by limbostar ( 116177 ) writes: <stephen&awdang,com> on Thursday March 03, 2005 @02:42PM (#11836331) Homepage
    
    They're not obligated to share unless they are planning on redistributing the software. They are perfectly free to patch their own software and use the patched versions for their servers without sharing those modifications.
    
    The GPL does not force them to do anything unless they wish to redistribute the software.
    
    Parent Share
    twitter facebook
  - Re:Question... (Score:2)
    
    by TuringTest ( 533084 ) writes:
    
    "patches" are considered "modifications", but since they're not distributing the code they're not forced to provide the source.
  - Re:Question... (Score:2)
    
    by GeckoX ( 259575 ) writes:
    
    If they are only using the modified software internally why would they be required to share them?
- Re:Question... (Score:2, Funny)
  
  by generic-man ( 33649 ) writes:
  
  They will, once the patches are out of beta.
Sure? (Score:5, Funny)

by ferar ( 64373 ) writes: on Thursday March 03, 2005 @02:08PM (#11835967)

I always thougth that they used NT + Access Database.

Share
twitter facebook
- Re:Sure? (Score:2)
  
  by BearJ ( 783382 ) writes:
  
  I thought it was just a big Excel file, then they have people reading the requests coming in off the web, hitting CTRL+F in Excel, and away you go!
  Or...maybe not
gCluster (Score:5, Informative)

by RobiOne ( 226066 ) writes: on Thursday March 03, 2005 @02:08PM (#11835973) Homepage Journal

They should make a googleCluster Live CD.. ala clusterKnoppix.. ..or perhaps use more of clusterKnoppix features or openmosix..share cpu/mem..
sourceforge is begging for something like this..

Their engineer desktops have special google builds of linux which help them compile things insanely fast with g4, ie hacked p4 (Perforce).

They also have one of the best intranet sites I've seen. Lots of info and services the employees can use, apart from email.

The internal blogs really help with keeping track of projects you're not working on, and what others are doing. Their mailing lists are often usefull too, for example there's a lost and found, for sale, and biking partners list. All kinds of usefull little stuff, taking care of the people with little nice things. Lots of reading too.

-- Robi

Share
twitter facebook
kernel patches? (Score:5, Insightful)

by alphan ( 774661 ) writes: on Thursday March 03, 2005 @02:08PM (#11835976) Homepage

Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.
and the obvious question:
where are the patches?
Anybody knows? This is not a GPL question just an ethical one.

Share
twitter facebook
- Re:kernel patches? (Score:2)
  
  by ornil ( 33732 ) writes:
  
  If they make custom patches for their own use and don't sell them, I don't see what's the problem. They have a legal and moral right to do that, it seems to me.
  - Re:kernel patches? (Score:3, Interesting)
    
    by rk ( 6314 ) writes:
    
    On their own servers, then they're obeying the rules.
    
    The question is: Do they use these patches on the search appliances they sell, and does that count as "distribution"? I honestly don't know the answer to that question, and I'd like to think Google has sharp legal advisors to go with their sharp technical people.
    - Re:kernel patches? (Score:2)
      
      by alphan ( 774661 ) writes:
      
      On their own servers, then they're obeying the rules.
      You mean GPL. The other "rules" are pretty subjective. I for one, would like to see Google act in favor of the community assuming the kernel patches are not the core of their technology.
      The question is: Do they use these patches on the search appliances they sell, and does that count as "distribution"? I honestly don't know the answer to that question, and I'd like to think Google has sharp legal advisors to go with their sharp technical people.
- Re:kernel patches? (Score:2, Insightful)
  
  by DeKO ( 671377 ) writes:
  
  If you consider the "freedom" involved in Free Software, you'll notice that they use their modified software for their own purposes. They are free to use the software in any way, they are free to modify it. And they aren't distributing it, so they aren't distributing the source code of their changes. I don't see any problem with it.
- Re:kernel patches? (Score:3, Insightful)
  
  by The Bungi ( 221687 ) writes:
  
  where are the patches?
  They'll tell you as soon as you point out where or how they are distributing them (yes, that's why it wasn't a GPL question).
  Why should Google be "ethical"? Likely these modifications are part of their IP trove, which keeps them ahead of the (already heated up) competition.
  If you don't like the way someone uses the software you're giving away then perhaps you shouldn't give it away, or maybe it's just that the license is flawed. It's dumb to expect people who run billion-dollar
  - - Re:kernel patches? (Score:3, Insightful)
      
      by digidave ( 259925 ) writes:
      
      Stop trying to make it a semantic argument. Distributing according to the GPL is not the same as patching your own systems and I'm sure you know that.
      
      The only question is whether or not Google is selling these patches as part of their appliances.
"The text you entered was not found." (Score:5, Interesting)

by Doc Ruby ( 173196 ) writes: on Thursday March 03, 2005 @02:10PM (#11836001) Homepage Journal

" pages can match even if none of the words in your query actually appear on the page"

The main flaw I've found in Google's results has been when it returns pages without one of my query words, which doesn't respond to the sense of my query. Sometimes it's changed page content at the same URL, so I go back and get the "cached" page, if it exists. The cached pages reveal in their headings whether the page matched only because the query word was found only in another page linking to the returned page. I'd like their immediate results to show that distinction, and to have links in the results to click around those pages related by my complete query. The current click/back/"cache" combinations are frustratingly disconnected, conflicting with Google's otherwise smooth immediacy.

Share
twitter facebook
- Re:"The text you entered was not found." (Score:2)
  
  by nkh ( 750837 ) writes:
  
  The other thing I hate is when Google returns different results based on the page you came from: Google is trying to autodetect the page of my country (Google.uk, Google.se, Google.fr, Google.de... I hate this) and then gives me results in the language of the detected page.
  
  When I ask for Google.com, I'd like Google.com, not Google.random_country! (but I know, it's not a bug, it's a feature)
  - Re:"The text you entered was not found." (Score:2)
    
    by omahajim ( 723760 ) writes:
    
    Isn't it simply doing this by geolocation of your IP address?
Google Maps - Designed to protect data centres (Score:5, Funny)

by Matt Clare ( 692178 ) writes: on Thursday March 03, 2005 @02:14PM (#11836050) Homepage

Google's redundancy theory works on a meta level, as well, according to Hoelzle. One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

"You don't have just one data center," he said, "you have multiples."

The real idea behind Google Maps is so that as the server catches fire it use it's last cycles to send an eMail to the nearest fire cheif and include a map. I think it would also throw in a GMail invite for incentive.

Share
twitter facebook
Question -- Is any of this considered P2P? (Score:2, Interesting)

by Didion Sprague ( 615213 ) * writes:

Question -- and this may be a dumb one, but I'm going to ask it anyway:

How much of what Google is doing -- the clustering, the redundancy, the sub-categorization -- how much of this (if any) could be described -- could fit under the mantle of "Peer-to-Peer"? Is anything that Google is doing here remotely considered P2P? (Even if the P2P is what's going on on their own, in-house servers?)

Obviously, I ask this because of the upcoming supreme court case. And I ask because it struck me as I read the article t
- Re:Question -- Is any of this considered P2P? (Score:2, Insightful)
  
  by MrAnnoyanceToYou ( 654053 ) writes:
  
  Interesting addendum to that question - Is Google infringing upon copyrighted information by caching EVERY page they run across? That seems like pulling massive amounts of copyrighted Java code or design code or images or etc. into their server for 'personal' use...? Does this break any laws?
no matches, yet matches (Score:2)

by pedantic bore ( 740196 ) writes:

... pages can match even if none of the words in your query actually appear...
Let me guess... the pages that match just happen to point to advertisers?
Frugal Google (Score:3, Insightful)

by Sundroid ( 777083 ) writes: on Thursday March 03, 2005 @02:22PM (#11836155) Homepage

The word, "cheap", is used 4 times in the C/Net article that describes Google's "secret of success" -- "buying relatively cheap machines", "cheap commodity PCs", "(Power) becomes a factor in running cheaper operations", "not just buying cheaper components".

They say being frugal is a virtue, which Google has, evidently. What is the lesson here? Holding down the cost and being innovative never fail. I guess.

Share
twitter facebook
MapReduce (Score:2, Informative)

by iluvcapra ( 782887 ) writes:

Alot of this stuff is application of SAN/RAID/Failover technology, which is cool (and we've never seen it so pervasively implemented), but not horribly revolutionary. I think the slickest thing they've developed, but might not get the most attention is their MapReduce [google.com] framework. The abstract from their paper:
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a _map_ function that processes a key/value pair to generate a set of i
Google and it's 1980's search literal-mindedness (Score:3, Insightful)

by Theovon ( 109752 ) writes: on Thursday March 03, 2005 @03:05PM (#11836602)

My wife is studying Library Information Science. In one class, she studied information retrieval. Here's what's interesting: It appears that although Google has much success with determining relevance by using PageRank, it's still very literal about the words you pick. Although it appears to do stemming (ie. 'runner' matches 'running'), it doesn't do anything about synonyms. Now, here, I'll point out that the the textbook for my wife's class was written in like 1995. In the SECOND CHAPTER, they talk about basic query techniques that make use of patterns in documents and AUTOMATICALLY derive what words are synonyms or in some way semantically related. These are long-solved problems. Some search engines employ human-generated lists of synonmyns, and there are whole databases you can download that contain semantic networks.

So, WHY, I ask, is google only now getting around to using these techniques?

Share
twitter facebook
- Re:Google and it's 1980's search literal-mindednes (Score:3, Insightful)
  
  by shish ( 588640 ) writes:
  
  *cough [google.co.uk]*
  It's not a great example, but my mind seems to have gone temporarily blank of words that have many synonyms :(
"we can't crawl as fast as we would like" (Score:3, Interesting)

by SnprBoB86 ( 576143 ) writes: on Thursday March 03, 2005 @03:08PM (#11836636) Homepage

Why not enhance the robots.txt format to include a max crawl rate variable? Let the webmaster specify how often a robot is allowed to crawl a page.

Share
twitter facebook
- Re:"we can't crawl as fast as we would like" (Score:2)
  
  by pe1chl ( 90186 ) writes:
  
  Why not?
  Because this already can be specified in html metadata:
Crawling rate... (Score:2)

by advocate_one ( 662832 ) writes:

"In parallel, clusters of document servers contain copies of Web pages that Google has cached. Hoelzle said that the refresh rate is from one to seven days, with an average of two days. That's mostly dependent on the needs of the Web publishers.
"One surprising limitation is we can't crawl as fast as we would like, because [smaller] webmasters complain," he said. "
well, we could introduce a setting into robots.txt where we can tell google how often they can spider your site...
And Debian in Gmail servers? (Score:2)

by stm2 ( 141831 ) writes:

When I tried to log into my gmail accout at the begining of the beta program, I got a Debian welcome screen.

Posted in my blog [bioinformatica.info].
- Re:Impressive technology but the algorithms aren't (Score:3, Interesting)
  
  by TheAwfulTruth ( 325623 ) writes:
  
  Heh, well they could NEVER do that :)
  
  Here's another great idea you inspired that they could also never do (being a commercial company themselves and all).
  
  When I am searching I virtually always want to do one of two distinct things:
  
  1) Sarch only commercial sites for a product to purchase.
  
  2) Search everything but commercial sites for information.
  
  There really should be a "$" flag that you could add (or at least a "!$" flag) to control wheather you see commercial or non-commercial sites in the results list
  - Re:Impressive technology but the algorithms aren't (Score:2)
    
    by WormholeFiend ( 674934 ) writes:
    
    you mean there are websites on the Internet that don't try to get any money from anyone?
- hardware (Score:2, Interesting)
  
  by r00t ( 33219 ) writes:
  
  Google really slaps together a pile of junk.
  Parts fail left and right, and nobody bothers
  to fix them. The software hides all this from
  the users.
  
  Google even checksums the data, on the assumption
  that it is frequently getting corrupted by all the
  junk hardware they buy.
- Re:define: cheap machines (Score:5, Interesting)
  
  by canadiangoose ( 606308 ) writes: <(moc.liamg) (ta) (mahargjd)> on Thursday March 03, 2005 @03:06PM (#11836619)
  
  I read somewhere that early Google datacentres were built by filling their racks with plywood shelves, then filling each shelf with one power supply running four motherboards each with one HDD. They didn't even use cases. This allowed them to build massively dense datacentres very cheaply. At one point they decided it wasn't worth it to replace dead hardware, so they started placing the racks too close together to be accessible. Why dig through and replace things when you can just keep adding more?
  Anyhow, the article mentioned that in these early datacentres they experienced something like a 25% hardware failure rate, but that it didn't matter because the software worked around it and the hardware was cheap.
  Here's a link to the page [topix.net] where I read all this neat stuff. It's probably mostly about the same stuff as the article we've all just slashdotted, but I won't be albe to tell for a while....
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

PigeonRank(TM) (Score:5, Funny)

Re:PigeonRank(TM) (Score:5, Funny)

Not Bubba! (Score:2)

Re:PigeonRank(TM) (Score:5, Funny)

Google Lunar (Score:4, Funny)

/. effect (Score:4, Funny)

Re:/. effect (Score:5, Interesting)

Re:/. effect (Score:2, Informative)

Re:/. effect (Score:3, Funny)

considering.... (Score:3, Insightful)

Re:/. effect (Score:2, Insightful)

Re:/. effect (Score:5, Funny)

Re:/. effect (Score:2)

Re:/. effect (Score:2)

Truly Amazing. (Score:5, Interesting)

Also Amazing: How much we miss (Score:5, Interesting)

Re:Also Amazing: How much we miss (Score:5, Insightful)

Re:Also Amazing: How much we miss (Score:3, Interesting)

Re:Also Amazing: How much we miss (Score:3, Insightful)

Re:Also Amazing: How much we miss (Score:3, Funny)

Re:Also Amazing: How much we miss (Score:2, Insightful)

Re:Also Amazing: How much we miss (Score:3, Interesting)

Re:Also Amazing: How much we miss (Score:3, Interesting)

Laziness, ignorance or (Score:2, Insightful)

Re:Laziness, ignorance or (Score:5, Insightful)

Re:Laziness, ignorance or (Score:3, Insightful)

Re:Truly Amazing. (Score:2)

That's an English pint, you yob. (Score:2)

interesting (Score:3, Funny)

Re:interesting (Score:3, Interesting)

Comment removed (Score:5, Funny)

Re:Whats really impressive (Score:5, Funny)

Re: (Score:3, Informative)

did for me (Score:3, Informative)

Re:Whats really impressive (Score:2)

Re:Whats really impressive (Score:2)

Re:Whats really impressive (Score:3, Interesting)

Picked up a Microsoftie (Score:2, Informative)

Meltdown? (Score:4, Interesting)

Re:Meltdown? (Score:4, Funny)

More useless search results? (Score:4, Insightful)

Re:More useless search results? (Score:5, Informative)

no AND needed (Score:5, Interesting)

Re:no AND needed (Score:5, Informative)

Re:no AND needed (Score:3, Insightful)

Re:no AND needed (Score:2)

Oops (Score:5, Funny)

Re:Oops (Score:2)

Re:Oops (Score:4, Interesting)

Re:Oops (Score:3, Interesting)

Too celver for their own good? (Score:3, Insightful)

"Celver"? Did I say "celver"? (Score:2)

Re:"Celver"? Did I say "celver"? (Score:2)

Re:Too celver for their own good? (Score:3, Insightful)

Re:Too celver for their own good? (Score:2)

Yeah, I noticed that (Score:4, Informative)

Re:Yeah, I noticed that (Score:3, Informative)

Re:Yeah, I noticed that (Score:2, Interesting)

Re:Yeah, I noticed that (Score:2)

Video about some of the backend stuff (Score:5, Interesting)

Re:Video about some of the backend stuff (Score:2)

Re:Video about some of the backend stuff (Score:2)

Re:Video about some of the backend stuff (Score:2)

Want the dean.pdf without a USENIX account? (Score:2, Informative)

Question... (Score:4, Interesting)

Re:Question... (Score:2, Insightful)

Re:Question... (Score:2)

Re:Question... (Score:5, Informative)

Re:Question... (Score:2)

Re:Question... (Score:2)

Re:Question... (Score:2, Funny)

Sure? (Score:5, Funny)

Re:Sure? (Score:2)

gCluster (Score:5, Informative)

kernel patches? (Score:5, Insightful)

Re:kernel patches? (Score:2)

Re:kernel patches? (Score:3, Interesting)

Re:kernel patches? (Score:2)

Re:kernel patches? (Score:2, Insightful)

Re:kernel patches? (Score:3, Insightful)