United States Patent [w]
Burrows
US005864863A [ii] Patent Number: 5,864,863 [45] Date of Patent: Jan. 26, 1999
[54] METHOD FOR PARSING, INDEXING AND SEARCHING WORLD-WIDE-WEB PAGES
[75] Inventor: Michael Burrows, Palo Alto, Calil.
[73] Assignee: Digital Equipment Corporation,
Maynard, Mass.
[21] Appl. No.: 696,406
[22] Filed: Aug. 9, 1996
[51] Int. CI.6 G06F 17/30
[52] U.S. CI 707/103; 707/3; 707/10;
707/104
[58] Field of Search 395/613, 610,
395/603, 615, 614; 707/103, 3, 10, 104
[56] References Cited
U.S. PATENT DOCUMENTS
5,280,610 1/1994 Travis, Jr. et al 395/614
5,440,744 8/1995 Jacobson et al 395/200.33
5,551,027 8/1996 Choy et al 395/617
5,581,758 12/1996 Burnett et al 395/614
5,640,558 6/1997 Li 395/612
5,649,186 7/1997 Ferguson 707/10
5,652,880 7/1997 Seagraves 395/614
5,652,882 7/1997 Doktor 395/612
5,668,988 9/1997 Chen et al 707/101
5,678,041 10/1997 Baker et al 707/9
OTHER PUBLICATIONS
Business Wire, Open Text's Web Search Server for OEM's; Offers Unique Intelligent Search Capabilities, p. 9181355 Jan. 1, 1995.
Information Intelligence Inc., World wide Web Search Engines: AltaVista & Yahoo, Dr Link, Accession No. 3168688 May, 1, 1996.
Yuwono et al, Wise: A World Wide Web Resource Database
System, IEEE Transations on Knowledge and Data Engi-
neering, vol. 8, No. 4, Aug. 1996 Apr. 29, 1996.
Steinberg, Seek and Ye Shall Find (maybe), Wired, May 1,
1996, p. 108 et al.
Primary Examiner—Wayne Amsbury
Attorney, Agent, or Firm—Dirk Brinkman
[57] ABSTRACT
A system indexes Web pages of the Internet. The pages are stored in computers distributively connected to each other by a communications network. Each page has a unique URL (universal record locator). Some of the pages can include URL links to other pages. A communication interface connected to the Internet is used for fetching a batch of Web pages from the computers in accordance with the URLs and URL links. The URLs are determined by an automated Web browser connected to the communications interface. A parser sequentially partitions the batch of specified pages into indexable words where each word represents an indexable portion of information of a specific page, or the word represents an attribute of one or more portions of the specific page. The parser sequentially assigns locations to the words as they are parsed. The locations indicates the unique occurrences of the word in the Web. The output of the parser is stored in a memory as an index. The index includes one index entry for each unique word. Each index entry also includes one or more location entries indicating where the unique word occurs in the Web. A query module parses a query into terms and operators. The operators relate the terms. Asearch engine uses object-oriented stream readers to sequentially read location of specified index entries, the specified index entries correspond to the terms of a query. A display module presents qualified pages located by the search engine to users of the Web.
1 Claim, 26 Drawing Sheets
![[merged small][graphic]](http://www.google.se/patents?id=Fa8WAAAAEBAJ&hl=sv&ie=ISO-8859-1&output=text&pg=PA1&img=1&zoom=3&hl=sv&q=&cds=1&sig=ACfU3U133k6Dsz5SC6LYzOENDXxe87eZOA&edge=0&edge=stretch&ci=133,866,753,479)