Friday, October 1, 2010

How many web pages in the Internet?

You would think that we would know the exact size of the Internet seeing that computers are so exact.  In truth, it is a mystery and may always remain so.
Many people might think that this number is well known.  Some might even think that when they type a search in Google it searches in seconds all the pages of the internet and ranks them in some hierarchical order.  This simply is not true.

The truth is that we are not sure just how many web pages there are on this amazing network called the Internet.  The Internet is truly like a flowing ever changing river.  Of one thing we can be certain.  It is NOT the same today as it was yesterday or even a second ago.  How do we know even this about it?  We know it through our web crawlers.  These programs search the internet continually looking for new pages, changes in already existing pages in a never ending thankless job of cataloguing and indexing.  As Kevin Kelly has said in the past, the Internet though seemingly inanimate behaves like a vast biological organism or perhaps even more like a vast biological ecosystem.  If you could graphically visualize it checking all the links and indexing the key words of each web page it might look something like this:

Here is another visualization of one busy website as if functions through its day getting more and more complex as time passes.  Does it not resemble a living organism in a way?

The textual way of seeing information has only had dominance because of our reliance on books which are text based.  But there may be better way of searching the internet and seeing the "big picture."  Watch this.  It is a web search done through a 3D software:

So all of this complexity makes the number of web pages on the internet a now question whose answer can change the next second.  So the answer is no one really knows for sure not even Google but there are estimates.  In 2008 a Google engineer estimated that there were 1,000,000,000,000 (1 trillion) pages on the Internet. Here is a description of how Google counted them:
We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links. In fact, we found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.
So several billion per day are added to the Internet and this survey was done in 2008.  So if we add 2 billion pages per day since 2008 we will get a number of approximately another 1,000,000,000,000 (trillion) thus marking the total number at present at 2 trillion web pages.  Of course this assume that none of the ones already existing have been deleted so this is just a very rough guess.

So this labyrinth called the Internet, this mystery universe of almost infinite capacity has not been measured and will not be accurately measured in the foreseeable future.  Which means it is quite possible to hide a web page somewhere in that labyrinth that no one except the owner and whoever is given directions to it will find.  In another blog we will speak about how Google finds the information people are looking for so well.

