[Home]Google Patent On Detecting Duplicate Files - Conclusions

Contents | (Visit Preferences to set your user name.) | Related To Google Patent On Detecting Duplicate Files - Conclusions | RecentChanges | Preferences | Index | Login | Logout

Featured: Featured Stories | Picture Gallery

Primordial Waters | This Is Not My Execution And I Will Not Claim It By Kevin Cooper
Google
Chat11.com Web Bible11.com MyBibleCenter.com
Search For Books About:
Computers, Engine News, Internet
Search The Net:
Computers
Internet
Engine News

Cover of ISBN 0596004478 Cover of ISBN 0764567586 Cover of ISBN 0072231742 Cover of ISBN 091096551X

Links:

Google Patents: Conclusions For Detecting duplicate and near-duplicate files

Subjects > Computers > Internet > Search Engine News

Back to Google Patent On Detecting Duplicate And Near-Duplicate Files

United States Patent 6,658,423

See also:

Conclusions

As can be appreciated from the foregoing, improved near-duplicate detection techniques are disclosed. These near-duplicate detection techniques are robust, and reduce processing and storage requirements. Such reduced processing and storage requirements is particularly important when processing large document collections.

The near-duplicate detection techniques have a number of important practical applications. In the context of a search engine for example, these techniques can be used during a crawling operation to speed-up the crawling and to save bandwidth by not crawling near-duplicate Web pages or sites, as determined from documents uncovered in a previous crawl. Further, by reducing the number of Web pages or sites crawled, these techniques can be used to reduce storage requirements of a repository, and therefore, other downstream stored data structures. These techniques can instead be used later, in response to a query, in which case a user is not annoyed with near-duplicate search results. These techniques may also he used to "fix" broken links. That is, if a document (e.g., a Web page) doesn't exist (at a particular location or URL) anymore, a link to a near-duplicate page can be provided.


http://images.amazon.com/images/P/B0001XQNSE.01-A1KDZ23Y0QWKQ3.MZZZZZZZ.jpg


Search for books about:

Computers, Engine News, Internet

Search The Net:
Computers
Internet
Engine News

Contents | (Visit Preferences to set your user name.) | Related To Google Patent On Detecting Duplicate Files - Conclusions | RecentChanges | Preferences | Index | Login | Logout
Edit this bookstore.mybiblecenter.com page | View other versions
Last edited April 8, 2007 3:01 am (diff)
Search:
Sign up for PayPal and start accepting credit card payments
instantly.
Bobsgear - Get A Free Enterrpise Wiki Space!
Review: The Bobsgear Project was started to develop a variety of Confluence plugins. This installation of the Confluence Enterprise wiki includes flexible attachments, many Confluence plugins, personal blogs, interesting articles, and more. Bobsgear already has spaces related to politics, art and photography wiki, technical issues wiki, ediscovery wiki, health, Christian theology and Sabbath School wiki, the bible, book reviews, and quotations. Bobsgear allows free signup, and invites anyone to create a free hosted Confluence wiki space.


NEW USERS CLICK HERE! for a quick introduction to Wiki.

 

 Interested in Main Keywords?
1671 total hits since 3/2007
Recently accessed pages: Actresses AdWords Al-Jazeera/TopAd1 Armstrong And Getty Show Notes - February 9, 2006 Avatar093003 Bill Clinton Challenges And Strategy - Printer Business Unit Civil Air Patrol Communications A Cut Course Notes Cleaning Oil Based Modelling Clay Electronics Coils Error 80246005 Eugenics GeorgeLeCompte's Bookmark Collection HSI RELB 332-0832- Old Testament Prophets Later Pedestal Lavatory Reaping The Robust Health Profits Of Forgiveness Server Timing Out When Publishing Large Sites With FrontPage Spam Tricks - Mining Message Boards And Chat Rooms Stretching FAQ 4.4 - Elements Of A Good Stretch

Elapsed:1