简体   繁体   English

Elasticsearch应该有不同的分数

[英]Elasticsearch should has different scores

I am retrieving documents by filtering and using a bool query to apply a score.我通过过滤和使用布尔查询来应用分数来检索文档。 For example:例如:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "color": "Yellow"
          }
        },
        {
          "term": {
            "color": "Red"
          }
        },

        {
          "term": {
            "color": "Blue"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

If data has only "Yellow" it gives me a score of "1.5" but if data has only "Red" it gives me a score of "1.4".如果数据只有“黄色”,它会给我“1.5”的分数,但如果数据只有“红色”,它会给我“1.4”的分数。 And I wanted the score to be the same.我希望分数是一样的。 Each data has only 1 match so why the scores are different?每个数据只有 1 个匹配项,为什么分数不同? There is anything to ignore the order of terms in should query?应该查询中有什么可以忽略术语的顺序吗? When I have only 1 match, the "Yellow" one will be always with a high score...当我只有 1 场比赛时,“黄色”的比赛总是会获得高分......

UPDATE : The issue is not in order of terms in should array but in "number of documents containing the term"更新:问题不是应该数组中的术语顺序,而是“包含该术语的文档数量”

You can use the filter clause along with the bool/should clause, if the scoring is not important for you如果评分对您不重要,您可以将filter子句与bool/should子句一起使用

The filter context avoids the scoring part and is a normal yes/no query.过滤上下文避免了评分部分,是一个正常的是/否查询。 So the score will always be 0.0 for the matched documents因此,匹配文档的分数将始终为 0.0

{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "term": {
                "color.keyword": "Yellow"
              }
            },
            {
              "term": {
                "color.keyword": "Black"
              }
            },
            {
              "term": {
                "color.keyword": "Purple"
              }
            }
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
} 

The score of the matched documents depends on several factors like length of the field, frequency of term, the total number of documents, etc.匹配文档的分数取决于几个因素,例如字段长度、术语频率、文档总数等。

You can know more about how score is calculated by using explain API您可以通过使用说明 API了解有关如何计算分数的更多信息

GET /_search?explain=true

@ESCoder using the example above I have: @ESCoder 使用上面的示例我有:

"Yellow" “黄色”

{
                      "value" : 1.5995531,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 30,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 150,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },

"Red" “红色的”

{
                      "value" : 1.0375981,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 53,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 150,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },

Each one (Red and Yellow) only appears once in each document.每一个(红色和黄色)在每个文档中只出现一次。 I want to have the same score if has Red or Yellow.如果有红色或黄色,我想获得相同的分数。 I don't care how many documents each one has.我不在乎每个人有多少文件。 If one document has only Yellow and another has only Red, I would like to have the same score for both.如果一个文件只有黄色而另一个文件只有红色,我希望两者的分数相同。 Is it possible?可能吗?

Like others mentioned - score depends on numerous factors.像其他人提到的 - 分数取决于许多因素。 However, if you want to ignore all of them, you could use constant_score to assign a consistent score if the document matches a specific term, eg:但是,如果您想忽略所有这些,则可以使用constant_score如果文档与特定术语匹配,则分配一致的分数,例如:

{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Yellow"
              }
            },
            "boost": 1
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Red"
              }
            },
            "boost": 1
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Blue"
              }
            },
            "boost": 1
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

I believe this should fulfill your requirement.我相信这应该满足您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM