<networking, World-Wide Web, search> A program (or "bot") that is the part of a search engine that discovers content on the Internet or other hypertext network. Web crawlers are also known as "robots" or "spiders" (because they crawl over a web). Google's web crawlers are known as googlebot.
A web crawler runs on a search provider's servers and connects to remote web sites to retrieve web pages and other documents. Often the crawler will keep a local copy of each document. It then scans these documents to extract any hypertext links pointing to other documents, which it then crawls, and so on until recursively it has discovered every reachable document. This is repeated to discover changes in the documents, possibly including links to new documents.
Typically web crawlers are operated to create an index of the content. The provider uses the index to quickly identify which documents match a user's query.
The standard for robot exclusion ("robots.txt") allows website owners to specify how they would like the site to be crawled, though not all crawlers follow it.
Early examples were Lycos and "WebCrawler".
Last updated: 2014-02-16
Try this search on Wikipedia, OneLook, Google
Nearby terms: WebCGM « WebCOMAL « web cramming « web crawler » webhead » web host » web hosting