简体   繁体   English

基于字段值计数的ElasticSearch提升相关性

[英]ElasticSearch boosting relevance based on the count of the field value

I'm trying to boost the relevance based on the count of the field value. 我试图基于字段值的计数来提高相关性。 The less count of the field value, the more relevant. 字段值的计数越少,则越相关。

For example, I have 1001 documents. 例如,我有1001个文档。 1000 documents are written by John, and only one is written by Joe. 约翰写了1000个文档,乔写了一个文档。

// 1000 documents by John
{"title": "abc 1", "author": "John"}
{"title": "abc 2", "author": "John"}
// ...
{"title": "abc 1000", "author": "John"}

// 1 document by Joe
{"title": "abc 1", "author": "Joe"}

I'll get 1001 documents when I search "abc" against title field. 当我针对标题字段搜索“ abc”时,我将获得1001个文档。 These documents should have pretty similar relevance score if they are not exact same. 如果这些文档不完全相同,则它们的相关度分数应该非常相似。 The count of field value "John" is 1000 and the count of field value "Joe" is 1. Now, I'd like to boost the relevance of the document {"title": "abc 1", "author": "Joe"} , otherwise, it would be really hard to see the document with the author Joe. 字段值“ John”的计数为1000,字段值“ Joe”的计数为1。现在,我想增强文档{"title": "abc 1", "author": "Joe"} ,否则,很难与作者Joe一起查看文档。

Thank you! 谢谢!

In case someone runs into the same use case, I'll explain my workaround by using Function Score Query . 如果有人遇到相同的用例,我将使用Function Score Query解释我的解决方法。 This way would make at least two calls to Elasticsearch server. 这种方式将至少两次调用Elasticsearch服务器。

  1. Get the counts for each person(You may use aggregation feature). 获取每个人的计数(您可以使用汇总功能)。 In our example, we get 1000 from John and 1 from Joe. 在我们的示例中,我们从John获得1000,从Joe获得1。
  2. Generate the weight from the counts. 从计数中产生权重。 The more counts, the less relevance weight. 计数越多,相关权重就越小。 Something like 1 + sqrt(1/1000) for John and 1 + sqrt(1/1) for Joe. 类似于John的1 + sqrt(1/1000)和Joe的1 + sqrt(1/1)
  3. Use the weight in the script to calculate the score according to the author value(The script can be much better): 使用脚本中的权重根据作者的值计算分数(脚本可能会更好):

     { "query": { "function_score": { "query": { "match": { "title": "abc" } }, "script_score" : { "script" : { "inline": "if (doc['author'].value == 'John') {return (1 + sqrt(1/1000)) * _score}\\n return (1 + sqrt(1/1)) * _score;" } } } } } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM