简体   繁体   English

弹性搜索更像是5.x中的查询得分问题

[英]Elastic search more like this Query score issue in 5.x

Recently we have changed Elasticsearch version from 2.4 to 5.4 . 最近,我们将Elasticsearch版本从2.4更改为5.4

we found one issue in more like this query in version 5.x . 在5.x版中,我们发现了更多类似此查询的问题。

following query is used to find out similar documents by text 以下查询用于按文本查找相似的文档

INPUT Query 输入查询

POST /test/_search
{
  "size": 10000,
"stored_fields": [
"docid"
],
 "_source": false,
"query": {
"more_like_this": {
"fields": [
    "textcontent"
  ],
  "like": [
    {
      "_index": "test",
      "_type": "object",
      "_id": "AV0c9jvZXF-b5U5aNAWB"
    }
  ],
  "max_query_terms": 5000,
  "min_term_freq": 1,
  "min_doc_freq": 1
}
}
}

Output of Elasticsearch 2.4 Elasticsearch 2.4的输出

{

"took": 16,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.5381224,
    "hits": [
        {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z9",
            "_score": 1.5381224,
            "fields": {
                "docid": [
                    "2"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal63Z",
            "_score": .5381224,
            "fields": {
                "docid": [
                    "3"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z",
            "_score": .381224,
            "fields": {
                "docid": [
                    "4"
                ]
            }
        }

Output of Elasticsearch 5.4 { Elasticsearch 5.4的输出 {

"took": 16,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.5381224,
    "hits": [
        {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z9",
            "_score": 168.5381224,
            "fields": {
                "docid": [
                    "2"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal63Z",
            "_score": 164.5381224,
            "fields": {
                "docid": [
                    "3"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z",
            "_score": 132.381224,
            "fields": {
                "docid": [
                    "4"
                ]
            }
        }}

The output is same in both versions except the score of the documents. 文档的分数外 ,两个版本的输出均相同。 version 5.4 is giving more score than 2.4. 5.4版比2.4版得分更高。 We are dependent on score for our work so if the score changes then its a problem for us. 我们的工作取决于分数,因此,如果分数发生变化,那么对我们来说就是一个问题。 Please provide solution for this? 请为此提供解决方案?

I got the solution,In version 5.0 they have changed default similarity algorithm from classic to BM25 that was the reason for it. 我得到了解决方案,在版本5.0中,他们已将默认相似性算法从经典更改为BM25,这就是这样做的原因。 Just change similarity type to classic while creating index. 创建索引时,只需将相似类型更改为经典即可。 and if index is already exist then just update setting for all indices by executing following query 并且如果索引已经存在,则只需通过执行以下查询来更新所有索引的设置

PUT /_all/_settings?preserve_existing=true          
{
  "index.similarity.default.type": "classic"
} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM