简体   繁体   English

Elasticsearch - 未获得预期的聚合计数

[英]Elasticsearch - not getting expected aggregated count

In a scenario, I have to search for phone numbers that start with '40'.在一个场景中,我必须搜索以“40”开头的电话号码。 I need to get a matching phone number that starts with '40' and a count of the phone number that is matching.我需要得到一个以“40”开头的匹配电话号码和匹配的电话号码的计数。

Actually, I want to search in multiple fields, just, for example, I am searching only for phone numbers.实际上,我想搜索多个字段,例如,我只搜索电话号码。

For that, I used the below query.为此,我使用了以下查询。

GET emp_details_1_1/_msearch 
{
   "index":"emp_details_1_1"
}{
   "_source":[
      
   ],
   "size":0,
   "min_score":1,
   "query":{
      "multi_match":{
         "query":"40",
         "fields":[
            "phone"
         ],
         "type":"phrase_prefix"
      }
   },
   "aggs":{
      "phone":{
         "terms":{
            "field":"phone.keyword",
            "include":"40.*"
         }
      },
      "phone_count":{
         "value_count":{
            "field":"phone.keyword"
         }
      }
   }
}

I am using Value Count aggregation for field-wise total count.我正在使用 Value Count 聚合来进行字段总计数。

In the output, I can see the phone number data starting with '40', ie one single record.在output中,我可以看到以'40'开头的电话号码数据,即一条记录。 Example '40x-xxx-xxxx' But When I see the count, the matching count is '4'.示例 '40x-xxx-xxxx' 但是当我看到计数时,匹配计数是 '4'。 Because while aggregating the query is considering a phone number that starts with '40' and also a phone number that has '40' in between after the dash '-'.因为在聚合查询时考虑的是一个以“40”开头的电话号码,以及一个在破折号“-”之后中间有“40”的电话号码。 Example: 'xxx-40x-xxxx','xxx-xxx-40x','xxx-xxx-40x'.示例:'xxx-40x-xxxx'、'xxx-xxx-40x'、'xxx-xxx-40x'。 While getting the aggregate count, I want to omit the phone numbers that have '40' in between.在获取总计数时,我想省略中间有“40”的电话号码。

Below is the output, I am getting.下面是 output,我得到了。

{
  "took" : 70,
  "responses" : [
    {
      "took" : 70,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 4,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "phone_count" : {
          "value" : 4
        },
        "phone" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "4034487929",
              "doc_count" : 1
            }
          ]
        }
      },
      "status" : 200
    }
  ]
}

I tried various options, but not getting the expected results.我尝试了各种选择,但没有得到预期的结果。

Instead of match_prefix query(which can be done only on text fields(analyzed)), you need to use the prefix query(done on keyword fields), good that you are already having the keyword field for your phone field, hence changing your query to below query will provide your correct results.而不是 match_prefix 查询(只能在文本字段(已分析)上完成),您需要使用前缀查询(在关键字字段上完成),很好,您已经拥有phone字段的keyword字段,因此更改您的查询以下查询将提供您正确的结果。

{
    "size": 0,
    "min_score": 1,
    "query": {
        "prefix": {
            "phone.keyword": "40"
        }
    },
    "aggs": {
        "phone": {
            "terms": {
                "field": "phone.keyword",
                 "include":"40.*"
            }
        },
        "phone_count": {
            "value_count": {
                "field": "phone.keyword"
            }
        }
    }
}

And result结果

 "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "phone_count": {
            "value": 1
        },
        "phone": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "4034487929",
                    "doc_count": 1
                }
            ]
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM