术语聚合上 doc_count 的范围过滤器

Question

{
    "size": 0,
    "aggs": {
        "categories_agg": {
            "terms": {
                "field": "categories",
                "order": {
                    "_count": "desc"
                }
            }
        }
    }
}

为了获取特定字段的聚合，我使用了上面给出的查询。 它工作正常并给出如下结果：

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 77445,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "categories_agg": {
      "doc_count_error_upper_bound": 794,
      "sum_other_doc_count": 148316,
      "buckets": [
        {
          "key": "Restaurants",
          "doc_count": 25071
        },
        {
          "key": "Shopping",
          "doc_count": 11233
        },
        {
          "key": "Food",
          "doc_count": 9250
        },
        {
          "key": "Beauty & Spas",
          "doc_count": 6583
        },
        {
          "key": "Health & Medical",
          "doc_count": 5121
        },
        {
          "key": "Nightlife",
          "doc_count": 5088
        },
        {
          "key": "Home Services",
          "doc_count": 4785
        },
        {
          "key": "Bars",
          "doc_count": 4328
        },
        {
          "key": "Automotive",
          "doc_count": 4208
        },
        {
          "key": "Local Services",
          "doc_count": 3468
        }
      ]
    }
  }
}

有没有一种方法可以过滤聚合，这样我就可以在每个存储桶的doc_count上获得特定范围内的存储桶？

例如，对doc_count使用范围过滤器，其中最大值为25000 ，最小值为5000应该给我

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 77445,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "categories_agg": {
      "doc_count_error_upper_bound": 794,
      "sum_other_doc_count": 148316,
      "buckets": [
        {
          "key": "Shopping",
          "doc_count": 11233
        },
        {
          "key": "Food",
          "doc_count": 9250
        },
        {
          "key": "Beauty & Spas",
          "doc_count": 6583
        },
        {
          "key": "Health & Medical",
          "doc_count": 5121
        },
        {
          "key": "Nightlife",
          "doc_count": 5088
        }
      ]
    }
  }
}

Answer 1

我通过 buckets_selector 解决了这个问题。 我们可以在脚本中过滤计数。

```
"aggs": {
    "categories_agg": {
      "terms": {
        "field": "cel_num",
        "size": 5000,
        "min_doc_count":1
      },
      "aggs": {
        "count_bucket_selector": {
          "bucket_selector": {
            "buckets_path": {
              "count": "_count"
            },
            "script": {
              "lang":"expression",
              "inline": "count>5000 && count <10000"
            }
          }
        }
      }
    }
  }
```

Answer 2

通过最小 doc_count（仅）从Elasticsearch 过滤器聚合中对最小文档计数进行过滤的简单解决方案。 为了节省您查找时间：

 aggs: {
    field1: {
        terms: {
            field: 'field1',
            min_doc_count: 1000
        },

术语聚合上 doc_count 的范围过滤器

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-03-24 03:40:56

解决方案2
1 2020-05-25 20:04:47

术语聚合上 doc_count 的范围过滤器

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-03-24 03:40:56

解决方案2 1 2020-05-25 20:04:47

解决方案1
3 已采纳 2017-03-24 03:40:56

解决方案2
1 2020-05-25 20:04:47