Note:Google Search Engine: Know that I am going to use word Google lots of times. Remember that whenever I do so, I am referring to the search part of Google and not whole tech conglomerate that it’s today, okay? Clear? Great!
Google started as a search engine. It is the most popular search engine with a market share of 70.23% (August, 2015). The job of a search engine is to look up for webpages across the WWW according to a search query, gather them and then rank them in a manner of their usefulness. Although there are many other search engines out there like Yahoo!, DuckDuckGo and Ask, etc. But Google is like king of search engines. It utilizes more than 200 ranking factors or signals to rank webpages, which is why its results are the most accurate and satisfying. It’s been in development for over 15 years now, but still new features are rolled out every now and then.
“The perfect search engine would understand exactly what you mean and give you back exactly what you want.” — Larry Page Co-founder & CEO
As a geeky fact, the word google is now also part of Oxford Learner’s Dictionary (among others). As per Oxford, google is a verb and it means to type words into the search engine Google® in order to find information about somebody/something. Almost everyone knows about Google, right? Today’s the time to learn how it works. I mean the mechanics, intellect, search ranking factors, index and everything that makes Google be as accurate as it is. Remember this post addresses the search part of Google; so we will learn how search crawls webpages, indexes them and rank them (most important part). Are you ready?
First of all comes this INDEX. What’s the index? Building a search engine, even a tiny one that only pulls results from a particular region requires lots of resources. Resources? That’s right. This business is really expensive. WWW is incredibly huge. It contains trillions of webpages. More than 60 trillion individual pages and increasing (!!). So when you make a search query, what really happens? Does Google search the WWW for you? No, it only searches through webpages that it has gathered in its ledger. This ledger or register is called index.
Google Search Engine Works Problem for Indexing
The index is over 100 billion gigabytes (!) and increasing. Of course, this index has to be stored somewhere. That’s where Google’s data centres join the game. Google operates a number of data centres. Google doesn’t share much about them, because secrecy gives them competitive advantage. The company believes that the size and power of their data centres could be useful to their competitors. These data centres are huge. No one knows the exact official number of data centres, but we do know that there are more than a dozen. Data centres mainly store supercomputers, which are 24/7 up and running. Google is probably one of the largest hardware manufacturers. It’s not just the index that these data centres store. There’s other stuff too.
Fun fact: Every single search result from Google and other search engines is filtered. Meaning that they are being indexed, just because they are on web. All search results are filtered, intentionally or unintentionally. The webpages that search engines consider appropriate, useful and effective are indexed. Adult material, hate speeches, webpages involving criminal activites, websites selling drugs or weapons (illegally) are prevented from being indexed. Why, you ask? Well, because they are illegal. Pure and simple. This web, or part of the web that is indexed by major search engines, is called surface internet.
Crawling in Google Search Engine
OK, you have got WWW. But there’s no way (that I know of) that Google can find out about all the webpages that WWW contains. No one can can type in web addresses from their head and “discover” unindexed webpages, right? So how does Google or any other search engine “discover” or come across webpages that are not indexed, so they can index them. That’s where crawling comes.
Crawling, by technical defintion, is the process of following links from one page to another. Links are ubiquitous. It is through links that most of us use or explore web, right? Link is among fundamentals of web. They lead from one page to another. So if you come to know about Dekho Geeko, this site, via some friend, you can find out about Data Centre Knowledge by clicking the link. You get the picture, right?
Indexing in Google Search Engine
But then… millions of webpages are being published every day. Unless you’re Superman, you can’t just come across all the published webpages via links. Web crawlers are what rescue Google here. Web crawlers servers or robots that are programmed to do this for Google. Googlebots are on 24/7 clicking or crawling from links to links. They are employed by the index, they gather pages and store them in the index.
Indexing is the process of saving webpages in the index. Googlebot is most-commonly known web crawler. Googlebot is kind of indexer. It crawles and index. Googlebot works hard 24 hours a day, 7 days a week, 365 days a year, without taking a break, just crawling through webpages via links and then downloading those webpages to store in the index.
Ranking in Google Search Engine
It is the most important and crucial part of whole thing. Ranking means to present webpages in order of high quality pages to lower. OK, you just googled “install windows 10” (without quotes). Google has already crawled and indexed all or most of the pages with the relevant keyword. Crawling and indexing part is done, what next?
Next Google has to automatically rank the pages that contain the search query. Ranking is really important, because you can’t just be thrown at with pages in any order, can you? Because not all webpages are equal. Some are having a lot of result better than other. This was basic idea behind the search engine, Google, that not webpages are equal.
That’s why ranking is most important. It is also the part, where search engines were not good at pre-Google era. Because just like any system, a computer can be tricked. Pre-Google search engines would rank results based upon the number of times search query (or keywords) appeared on pages. This gave birth to SEO. Because anyone could add more and unnecessary keywords to their pages and get their pages ranked higher than it deserved. Trick successful, right? Fact: Practice of adding more and unnecessary keywords to page or stuffing them with keywords is called keyword stuffing. Whereas the ratio of keywords and other words is called keyword density.
Although Google had only one algorithm, PageRank, to better rank webpages. Now they have more than 200 search ranking factors, which are considered as pages are ranked.