Friday, October 1, 2010

How Does Google Find What You Want?

Most people do not understand how Google finds what they want.  Here's how.
First it must be understood that Google was NOT the first search engine for the web.  Here are a list of the 57 different search engines since the beginning of the Internet with such strange names as Dogpile, GenieKnows and ChaCha.  Google started in 1998.  Today people search it about 91,000,000 times per day, which is about 1,053 searches a second.

When you search on Google it does not search the entire internet.  It searches its index of pages and lists them according to what is called "PageRank."  PageRank was named after Larry Page one of the founders of Google.  PageRank is a trademark of Google and is a "link analysis algorithm."  An algorithm is a step by step problem solving procedure programmed for a computer.  Link analysis looks for associations between objects.  I am sure this diagram will clear things right up for those who are not programmers or math majors:

For those who MIGHT want a slightly simpler explanation for the process there is this video:
So the steps that happen in seconds are the following (this explanation with the accompanying diagram I have taken from this excellent site):
1.  Google's web crawler named googlebot finds and retrieves the pages on the web
2.  Google Indexer sorts the pages alphabetically by search term and an entry which sotres the list of documents where term appears and the location of the term within the text
3.  Google's Query Processor evaluates your search and matches it to the relevant documents.  The PageRank orders the pages that match in order of importance using over 100 factors to determine the rank which includes the popularity of the page (i.e., how many others have clicked on it doing the same search).  This is an over simplistic explanation.  The formulas that Google uses for this calculation are closely guarded and are constantly improved.

Google uses what is called "machine-learning" techniques to automatically improve its understanding of relationship and associations within the stored information in the pages it has indexed.  This is how it learned to correct your spelling when you mistype a word in a search.

Although this might appear to resemble artificial intelligence it isn't.  It is simply very clever programming combined with exhaustive trial and error by google's computers until it statistically gets it right, whether this be spelling or even language translation.  This might explain why Google is such a wealthy company.

1 comment:

Jeremiah Bilas said...

I actually remember Dogpile, I remember using one called hotbot a lot also, until I realized it was terrible. But let me come to the defense of Google, you end this article denying google's search abilities as AI and I think that wrong. It took a lot of trial and error to evolve our intelligence, and the process sounds very much like our neural network. certain nurons strengthened when repeatedly accessed while others are left to weaken or die when not accessed. Just because we understand how it works doesn't mean we should deny its intelligence. One day we will also understand all the workings of our own intelligence.