简体   繁体   English

弹性搜索-术语方面结果不一致

[英]elastic search - inconsistent term facets results

I couldn't find an answer in previous posts so I hope my post is relevant. 我在以前的帖子中找不到答案,所以希望我的帖子有意义。 I am having troubles with ElasticSearch term facets. 我在ElasticSearch术语方面遇到了麻烦。

When I query the count of documents for every term facet, I get, let's say 8 for some field value but when I query the count of document with that specific value for the field, I get, let's say 19. 当我查询每个术语方面的文档数时,我得到某个字段值,比如说8,但是当我查询具有该字段特定值的文档数时,我得到了比如19。

To be more recise, I am using Kibana and here are the queries and responses (I was told to rename the field value fyi) : 更确切地说,我正在使用Kibana,这是查询和响应(被告知要重命名字段值fyi):

all term facets count query: 所有术语方面计数查询:

{
    "facets" : {
        "terms" : {
            "terms" : {
                **"fields" : ["field.name"],**
                "size" : 6,
                "order" : "count",
                "exclude" : []
            },
            "facet_filter" : {
                "fquery" : {
                    "query" : {
                        "filtered" : {
                            "query" : {
                                "bool" : {
                                    "should" : [{
                                            "query_string" : {
                                                "query" : "*"
                                            }
                                        }
                                    ]
                                }
                            },
                            "filter" : {
                                "bool" : {
                                    "must" : [{
                                            "match_all" : {}

                                        }
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "size" : 0
}

the response: 响应:

{
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 20374,
        "max_score" : 0.0,
        "hits" : []
    },
    "facets" : {
        "terms" : {
            "_type" : "terms",
            "missing" : 10567,
            "total" : 9918,
            "other" : 9781,
            "terms" : [{
                    "term" : "fieldValue1"
                    "count" : 43
                }, {
                    "term" : "fieldValue2",
                    "count" : 27
                }, {
                    "term" : "fieldValue3",
                    "count" : 23
                }, {
                    "term" : "fieldValue4",
                    "count" : 23
                }, {
                    "term" : "fieldValue5",
                    "count" : 13
                }, {
                    "term" : "fieldValue6",
                    "count" : 8
                }
            ]
        }
    }
}

the query on "fieldValue6" 在“ fieldValue6”上的查询

{
    "facets" : {
        "terms" : {
            "terms" : {
                "fields" : ["field.name"],
                "size" : 6,
                "order" : "count",
                "exclude" : []
            },
            "facet_filter" : {
                "fquery" : {
                    "query" : {
                        "filtered" : {
                            "query" : {
                                "bool" : {
                                    "should" : [{
                                            "query_string" : {
                                                "query" : "*"
                                            }
                                        }
                                    ]
                                }
                            },
                            "filter" : {
                                "bool" : {
                                    "must" : [{
                                            "terms" : {
                                                "field.name" : ["fieldValue6"]
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "size" 

the response : 响应 :

{
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 20374,
        "max_score" : 0.0,
        "hits" : []
    },
    "facets" : {
        "terms" : {
            "_type" : "terms",
            "missing" : 0,
            "total" : 19,
            "other" : 0,
            "terms" : [{
                    "term" : "fieldValue6",
                    "count" : 19
                }
            ]
        }
    }
}

the field I apply the facet filter (or whatever it is actually supposed to be called) is set as "not analyzed" : 我应用构面过滤器(或实际上应称为的任何对象)的字段设置为“未分析”

properties: {
    type_ref2Strack: {
        properties: {
            position: {
                type: long
            }
            name: {
                index: not_analyzed
                norms: {
                    enabled: false
                }
                index_options: docs
                type: string
            }
        }
    }
}

This is a long standing known limitation of elasticsearch facets (now called aggregations). 这是弹性搜索方面(现称为聚合)的长期已知限制。

The key problem is that it runs the facet against each shard with given size and then combines the results, meaning counts can get chopped off. 关键问题在于,它将小平面与给定大小的每个分片对齐,然后合并结果,这意味着可以将计数切掉。

There are two non-ideal ways to handle this: 有两种非理想的方法来处理此问题:

  • Add a much larger "shard_size" input than you really need. 添加比实际需要大得多的“ shard_size”输入。 This will mostly work, but counts are still not guaranteed to be exact. 这通常会起作用,但仍不能保证计数准确。
  • Have an index that is just a single shard. 索引只是一个分片。 This way, it will always collect the exact results. 这样,它将始终收集准确的结果。 This will impact scaling an index to a very large number of documents, but YMMV 这将影响将索引缩放到大量文档,但是YMMV

For more info see here: 有关更多信息,请参见此处:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate http://www.elasticsearch.org/guide/zh-CN/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM