Launched in June 2010 across all data centers, regions, and languages, Google Caffeine was a revamp of Google’s indexing infrastructure – not the same thing as a change to Google’s ranking algorithms. The Google blog called it a “whole new web indexing system” that’s “more than 50 percent fresher than our last index and it’s the largest collection of web content we’ve offered”.
Caffeine benefits both searchers and content owners because it means that all content (and not just content deemed ‘real time’) is searchable within seconds after its crawled.
Basically, content is available to searchers more quickly.
It’s important to realise that caffeine is only a change in our indexing architecture. What’s exciting about Caffeine though is that it allows easier annotation of the information stored with documents, and subsequently can unlock the potential of better ranking in the future with those additional signals.
Previously, Google’s crawling and indexing systems worked as batch processes. Googlebot would crawl a set of pages, then process those pages (extracting content from them, associating data about them, such as anchor text and external links, determining what those pages were about), and finally add them to the index. While this system was continuous, all the documents in the batch had to wait until the whole batch was processed to be pushed live. With Google Caffeine, when Google crawls a page, it processes that page through the entire indexing pipeline and pushes it live nearly instantly; this change resulted in a 50 percent fresher index than before.
Note that the introduction of Caffeine didn’t necessarily mean that pages would be crawled on a faster schedule than before – remember, you can estimate how often your pages are crawled by taking a look at your server logs or checking the cache dates in Google.
The introduction of Google Caffeine simply meant that once web pages are crawled, they are made available to searchers much more quickly. Cue the realisation of the algorithmic layer to leverage Google Caffeine’s infrastructure: Google Freshness.