| Author |
Message
|
| wienertakesall |
Posted: Sun Jul 13, 2008 6:10 am Post subject: search robots |
|
|
Forum Freshman

Joined: 30 Jun 2008 Posts: 38 Location: Poland
|
Do search engine's robots search the entire web each time that somebody cliks the search button or do their search indexes on say Google's servers?
My guess is that it's impossible to search the internet in several seconds.
Howe is it in reality? |
|
| Back to top |
|
 |
| JaneBennet |
Posted: Sun Jul 13, 2008 6:38 am Post subject: |
|
|
 Forum Ph.D.

Joined: 06 Apr 2008 Posts: 801
|
What happens is that the search bots “crawl” through the Web, taking snapshots of as many Web pages as they can find. The info they gather is then cached and indexed. When someone makes a Web search, the engine will go through the cache of Web pages and display results that match the search terms.
This is a simplified account of what basically happens; what actually happens may be much more complicated. For example, some engines can recognize typos or spelling errors and display close-matching results rather than exact-matching results.
You may notice that when Google displays a search result, it includes a brief quote from the Web page displaying some or all of your search terms; when you click the link to go to the Web page, however, you may find that the contents of the page is different from the brief quote accompanying the search result. This is proof that Google does not search the Web directly, but only its cache of stored pages. Between the time when the crawler took a snapshot of the Web page and the time you made your search, the Web page might have been updated.
Cached pages are also not permanently stored with the search engine. Each cached page has limit for how long it’s stored; when the time limit has expired, it will be deleted from the cache. This is to prevent the engine’s server from becoming overloaded with outdated cached pages. _________________
A problem worthy of attack
Proves its worth by fighting back.
(Piet Hein)
Did You Know?
Fact of the day: Old English |
|
| Back to top |
|
 |
| wienertakesall |
Posted: Sun Jul 13, 2008 7:35 am Post subject: thank you for your clarification |
|
|
Forum Freshman

Joined: 30 Jun 2008 Posts: 38 Location: Poland
|
|
| Back to top |
|
 |
| CelticMadScientist |
Posted: Sun Jul 13, 2008 11:49 am Post subject: |
|
|
 Forum Freshman

Joined: 12 Jul 2008 Posts: 19 Location: U.S.A.
|
Well, for large scale like Google, I haven't had the chance to learn yet. But maybe for a flavor of the direction you might head, say you have a search engine for a small website. You make a matrix, with each column being a normalized vector of the keyword frequencies (rows correspond to keywords). The search query is turned into a normalized vector of keywords. You do the matrix vector multiplication, resulting in vector of the cosine of the angle between each column and the search vector. Order them by angle to see which pages match the keyword query best.
References:
Steven J. Leon, Linear Algebra with Applications, 2002, p 230-232. _________________ Celtic Mad Scientist
Celtic Mad Scientist's MetaCafe Channel - Science & Fun How-to Videos
celticmadscientist.com |
|
| Back to top |
|
 |
|
|