简体   繁体   中英

Solr proximity search scoring

I am new to Solr and studying the basic scoring model. I understand that basic scoring model employs Boolean to generate the document set and then uses the vector space model to score for ranking according to relevance. What I want to know that while using Proximity searches, do the search results also gets ranked according to the vector space model after generation OR are they just scored based on the edit distance?

First of all, VSM score is used in org.apache.lucene.search.similarities.TFIDFSimilarity (keep in mind, it's not a default Similarity in the recent versions of Lucene). For example, org.apache.lucene.search.similarities.BM25Similarity implements something similar, but rather called bag of words .

In case of proximity searches, the base class org.apache.lucene.search.similarities.Similarity has a nested class Similarity.SimScorer which is responsible for scoring "sloppy" queries such as SpanQuery , and PhraseQuery . Usually, there is a method calculating sloppyFreq , which is a function of edit distance and it's added as an additional coefficient in formula.

One of the default implementations of the sloppyFreq is 1.0f / (distance + 1) , but of course it could be customized, depending on your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM