如何在 date_histogram 聚合上设置大小

Question

I'm executing a query in elasticsearch.我正在 elasticsearch 中执行查询。 I need to have the number of hits for my attribute "end_date_ut" (type is Date and format is dateOptionalTime) for each month represented in the index.我需要为索引中表示的每个月的属性“end_date_ut”（类型为日期，格式为 dateOptionalTime）获得点击次数。 For that, I'm using a date_histogram aggregation.为此，我使用了 date_histogram 聚合。

My query just bellow:我的查询如下：

GET inc/_search
{
  "size": 0,
  "aggs": {
    "appli": {
      "date_histogram": {
        "field": "end_date_ut",
        "interval": "month"
      }
    }
  }
}

And here is a part of the result:这是结果的一部分：

"hits": {
    "total": 517478,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "appli": {
      "buckets": [
        {
          "key_as_string": "2009-08-01T00:00:00.000Z",
          "key": 1249084800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2009-09-01T00:00:00.000Z",
          "key": 1251763200000,
          "doc_count": 1
        },
        {
          "key_as_string": "2009-10-01T00:00:00.000Z",
          "key": 1254355200000,
          "doc_count": 2362
        },
        {
          "key_as_string": "2009-11-01T00:00:00.000Z",
          "key": 1257033600000,
          "doc_count": 5336
        },
        {
          "key_as_string": "2009-12-01T00:00:00.000Z",
          "key": 1259625600000,
          "doc_count": 7536
        },
        {
          "key_as_string": "2010-01-01T00:00:00.000Z",
          "key": 1262304000000,
          "doc_count": 8864
        }

The problem is that I have too many buckets (results).问题是我有太多的桶（结果）。 When I'm using "terms aggregation", I don't have any problems because I can set a size, but with "date_histogram aggregation" I can't find a way to put a limit on my query result.当我使用“术语聚合”时，我没有任何问题，因为我可以设置大小，但是使用“date_histogram 聚合”我找不到限制查询结果的方法。

Answer 1

{
    "size": 0,
    "aggs": {
        "by_minute": {
            "date_histogram": {
                "field": "createTime",
                "interval": "1m",
                "order": {
                    "_count": "desc"
                }
            },
            "aggs": {
                "top2": {
                    "bucket_sort": {
                        "sort": [],
                        "size": 2
                    }
                }
            }
        }
    }
}

{
    "took": 28,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 999999,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "by_minute": {
            "buckets": [
                {
                    "key_as_string": "2019-12-21T16:13:00.000Z",
                    "key": 1576944780000,
                    "doc_count": 6374
                },
                {
                    "key_as_string": "2019-12-21T16:10:00.000Z",
                    "key": 1576944600000,
                    "doc_count": 6327
                }
            ]
        }
    }
}

Answer 2

I suggest to use min_doc_count to only include buckets that have data, ie the buckets with 0 documents would not come back in the response.我建议使用min_doc_count只包含有数据的存储桶，即具有 0 个文档的存储桶不会在响应中返回。

GET inc/_search
{
  "size": 0,
  "aggs": {
    "appli": {
      "date_histogram": {
        "field": "end_date_ut",
        "interval": "month",
        "min_doc_count": 1          <--- add this
      }
    }
  }
}

If you can, you can also add a range query in order to restrain the time interval on which the aggregation is run.如果可以，您还可以添加range查询以限制运行聚合的时间间隔。

如何在 date_histogram 聚合上设置大小

问题描述

2 个解决方案

解决方案1
1 2020-04-02 11:40:14

解决方案2
0 已采纳 2017-09-26 12:18:17

如何在 date_histogram 聚合上设置大小

问题描述

2 个解决方案

解决方案1 1 2020-04-02 11:40:14

解决方案2 0 已采纳 2017-09-26 12:18:17

解决方案1
1 2020-04-02 11:40:14

解决方案2
0 已采纳 2017-09-26 12:18:17