简体   繁体   中英

Apache lucene inverted index

Does Lucene index use tf-idf as weights? Is it possible to define your own statistics and weights for each document, and "plug" them into Lucene?

Yes, the default scoring algorithm incorporates tf-idf, and is fully documented in the TFIDFSiilarity documentation .

There are a number of ways to customize the scoring of documents.

  • The simplest and most common is to incorporate a boost, either on a field at index time, or on a query term when querying.
  • Many query types modify the scoring used for that query. Examples include ConstantScoreQuery and DisjunctionMaxQuery .
  • The Similarity you use defines the scoring algorithm. You could select a different one (ex. BM25Similarity ).
  • You can implement your own Similarity , Usually by extending a higher-level implementation such as DefaultSimilarity , TFIDFSimilarity , or SimilarityBase

Just go through this example. It may help help you to know how you can bring custom changes in indexing process


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM