Method for automatically selecting collections to search in full text
searches
Abstract
A method of selecting a subset of a plurality of document collections for
searching in response to a predetermined query is based on accessing a
meta-information data file that describes the query significant search
terms that are present in a particular document collection correlated to
normalized document usage frequencies of such terms within the documents
of each document collection. By access to the meta-information data file,
a relevance score for each of the document collections is determined. The
method then returns an identification of the subset of the plurality of
document collections having the highest relevance scores for use in
evaluating the predetermined query. The meta-information data file may be
constructed to include document normalized term frequencies and other
contextual information that can be evaluated in the application of a query
against a particular document collection. This other contextual
information may include term proximity, capitalization, and phraseology as
well as document specific information such as, but not limited to
collection name, document type, document title, authors, date of
publication, publisher, keywords, summary description of contents, price,
language, country of publication, publication name. Statistical data for
the collection may include such as, but not limited to number of documents
in the collection, the total size of the collection, the average document
size and average number of words in the base document collection.
| Inventors: |
Kirsch; Steven T. (Los Altos, CA), Chang; William I. (Mountain View, CA) |
| Assignee: |
Inioseek Corporation
(Sunnyvale,
CA)
|
| Appl. No.:
|
08/928,542 |
| Filed:
|
September 12, 1997 |