The Science Forum - Scientific Discussion and Debate  
 
 Live Chat    FAQ    Search    Usergroups
 
Register  ::  Log in Log in to check your private messages
 
Science Forum Forum Index » Computer Science » search robots

  
 search robots « View previous topic :: View next topic » 
Author Message
wienertakesall
Posted: Sun Jul 13, 2008 6:10 am    Post subject: search robots Reply with quote

Forum Freshman
Forum Freshman

Joined: 30 Jun 2008
Posts: 38
Location: Poland

Do search engine's robots search the entire web each time that somebody cliks the search button or do their search indexes on say Google's servers?
My guess is that it's impossible to search the internet in several seconds.
Howe is it in reality?
Back to top
View user's profile Send private message
JaneBennet
Posted: Sun Jul 13, 2008 6:38 am    Post subject: Reply with quote

Forum Ph.D.
Forum Ph.D.

Joined: 06 Apr 2008
Posts: 801

What happens is that the search bots “crawl” through the Web, taking snapshots of as many Web pages as they can find. The info they gather is then cached and indexed. When someone makes a Web search, the engine will go through the cache of Web pages and display results that match the search terms.

This is a simplified account of what basically happens; what actually happens may be much more complicated. For example, some engines can recognize typos or spelling errors and display close-matching results rather than exact-matching results.

You may notice that when Google displays a search result, it includes a brief quote from the Web page displaying some or all of your search terms; when you click the link to go to the Web page, however, you may find that the contents of the page is different from the brief quote accompanying the search result. This is proof that Google does not search the Web directly, but only its cache of stored pages. Between the time when the crawler took a snapshot of the Web page and the time you made your search, the Web page might have been updated.

Cached pages are also not permanently stored with the search engine. Each cached page has limit for how long it’s stored; when the time limit has expired, it will be deleted from the cache. This is to prevent the engine’s server from becoming overloaded with outdated cached pages.
_________________
 
A problem worthy of attack
Proves its worth by fighting back.
(Piet Hein)

Did You Know?
Fact of the day: Old English
Back to top
View user's profile Send private message Visit poster's website
wienertakesall
Posted: Sun Jul 13, 2008 7:35 am    Post subject: thank you for your clarification Reply with quote

Forum Freshman
Forum Freshman

Joined: 30 Jun 2008
Posts: 38
Location: Poland

As above
Back to top
View user's profile Send private message
CelticMadScientist
Posted: Sun Jul 13, 2008 11:49 am    Post subject: Reply with quote

Forum Freshman
Forum Freshman

Joined: 12 Jul 2008
Posts: 19
Location: U.S.A.

Well, for large scale like Google, I haven't had the chance to learn yet. But maybe for a flavor of the direction you might head, say you have a search engine for a small website. You make a matrix, with each column being a normalized vector of the keyword frequencies (rows correspond to keywords). The search query is turned into a normalized vector of keywords. You do the matrix vector multiplication, resulting in vector of the cosine of the angle between each column and the search vector. Order them by angle to see which pages match the keyword query best.

References:
Steven J. Leon, Linear Algebra with Applications, 2002, p 230-232.
_________________
Celtic Mad Scientist
Celtic Mad Scientist's MetaCafe Channel - Science & Fun How-to Videos
celticmadscientist.com
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
   Page 1 of 1

Science Forum Forum Index » Computer Science » search robots
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
 
 


Google
 

© 2004-2008 Thescienceforum.com

Sponsored by EnluxLED

Partner Forums
Politics Forum  Radar Detector