简体   繁体   中英

Configuring ElasticSearch relevance score to prefer a match on all words over a match with some words?

For example, with a search for "stack overflow" I want a document containing both "stack" and "overflow" to have a higher score than a document containing only one of those words.

Right now, I am seeing cases where a document that contains "stack" 0 times and "overflow" 50 times gets ranked above a document that contains "stack" 1 time and "overflow" 1 time.

A secondary concern is ranking documents higher that have the exact word as opposed to a word variant. For example, a document containing "stack" should be ranked higher than a document containing "stacking".

A third concern is ranking documents higher that have the words adjacent. For example a document "How to use stack overflow" should be ranked higher than a document "The stack of papers caused the inbox to overflow."

If you put those three concerns together, here is an example of the desired rank of results for "stack overflow":

示例搜索结果

Is it possible to configure an index or a query to calculate score this way?

Here you are trying to achieve multiple things in a single query. First you should try to understand how ES is returning you the results.

  1. Document containing overflow 50 times gets ranked above a document that contains "stack" 1 time and "overflow" 1 time because ES score calculation is based on tf/idf based score calculation. And in this case obviously, overflow comes 50 times which is quite higher than other frequency combined for other 2 terms in another document.

Note:- You can disable this calculation as mentioned in the link.

If you don't care about how often a term appears in a field and all you care about is that the term is present, then you can disable term frequencies in the field mapping:

  1. You are getting the results containing the term stacking due to stemming and if you don't want document containing stacking shouldn't come in search results, than don't documents in stemmed form or do some post-processing after getting the results from ES and reduce their score, not sure if ES provide it out of the box.

  2. The third thing which you want is a phrase search .

Also use explain api to understand, how ES calculates the score of the document with your query, It will help you to construct the right query according to your requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM