简体   繁体   中英

Apache lucene inverted index

Does Lucene index use tf-idf as weights? Is it possible to define your own statistics and weights for each document, and "plug" them into Lucene?

Yes, the default scoring algorithm incorporates tf-idf, and is fully documented in the TFIDFSiilarity documentation .

There are a number of ways to customize the scoring of documents.

  • The simplest and most common is to incorporate a boost, either on a field at index time, or on a query term when querying.
  • Many query types modify the scoring used for that query. Examples include ConstantScoreQuery and DisjunctionMaxQuery .
  • The Similarity you use defines the scoring algorithm. You could select a different one (ex. BM25Similarity ).
  • You can implement your own Similarity , Usually by extending a higher-level implementation such as DefaultSimilarity , TFIDFSimilarity , or SimilarityBase

Just go through this example. It may help help you to know how you can bring custom changes in indexing process

http://lucene.apache.org/core/4_3_1/demo/src-html/org/apache/lucene/demo/IndexFiles.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM