“A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner”
Thats how wikipedia defines a crawler. And thats how search engines index millions of pages. The crawler basically scans the whole page and sends raw, unprocessed data to the search engine database. The data is then processed by some complex algorithms and then the crawled pages are added ro the search results.
This approach works fine as long as the search queries are simple and direct. A google search on “Indian Railways” would yeild me relevant results but a more natural language query like “Which railway minister left the railways bankrupt” is left unanswered.
Thats when we need intelligent crawlers - ‘the users’. The users will scan the web pages they visit, give a meaningful description of the page and report updates. This is in turn will lead to semantic crawling and thus will lead to better search results. This needs to be aided with community moderation in order to avoid misuse and enhance the engine.
