Jacob from Denmark posted an interesting question regarding Google’s spiders. He wanted to know exactly how many of these things are currently deployed all over the Web, crawling site after site. Matt Cutts clarifies what the concept of a search engine spider actually means and how this translates to the real world.
Essentially, spiders are abstract computer concepts. They do not exist in real life in the sense of tangible things that traverse the World Wide Web. Rather, what Google does is that it sets up banks of machines in data centers and programs them to make HTTP requests to acquire web pages. This all happens in high speed and even a small number of machines can accomplish a lot when working in parallel. The company does its best to cover each portion of the Web every few days to get the freshest contents.
While Google does not give out specific numbers, Matt hints that it is a fairly small group of machines that do the job (“more than 25, less than a thousand”). The real challenge is in sorting through all of the content, organizing them, and finding out which ones are truly reputable.