简体   繁体   中英

How to put a size on a date_histogram aggregation

I'm executing a query in elasticsearch. I need to have the number of hits for my attribute "end_date_ut" (type is Date and format is dateOptionalTime) for each month represented in the index. For that, I'm using a date_histogram aggregation.

My query just bellow:

GET inc/_search
{
  "size": 0,
  "aggs": {
    "appli": {
      "date_histogram": {
        "field": "end_date_ut",
        "interval": "month"
      }
    }
  }
}

And here is a part of the result:

"hits": {
    "total": 517478,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "appli": {
      "buckets": [
        {
          "key_as_string": "2009-08-01T00:00:00.000Z",
          "key": 1249084800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2009-09-01T00:00:00.000Z",
          "key": 1251763200000,
          "doc_count": 1
        },
        {
          "key_as_string": "2009-10-01T00:00:00.000Z",
          "key": 1254355200000,
          "doc_count": 2362
        },
        {
          "key_as_string": "2009-11-01T00:00:00.000Z",
          "key": 1257033600000,
          "doc_count": 5336
        },
        {
          "key_as_string": "2009-12-01T00:00:00.000Z",
          "key": 1259625600000,
          "doc_count": 7536
        },
        {
          "key_as_string": "2010-01-01T00:00:00.000Z",
          "key": 1262304000000,
          "doc_count": 8864
        }

The problem is that I have too many buckets (results). When I'm using "terms aggregation", I don't have any problems because I can set a size, but with "date_histogram aggregation" I can't find a way to put a limit on my query result.

{
    "size": 0,
    "aggs": {
        "by_minute": {
            "date_histogram": {
                "field": "createTime",
                "interval": "1m",
                "order": {
                    "_count": "desc"
                }
            },
            "aggs": {
                "top2": {
                    "bucket_sort": {
                        "sort": [],
                        "size": 2
                    }
                }
            }
        }
    }
}
{
    "took": 28,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 999999,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "by_minute": {
            "buckets": [
                {
                    "key_as_string": "2019-12-21T16:13:00.000Z",
                    "key": 1576944780000,
                    "doc_count": 6374
                },
                {
                    "key_as_string": "2019-12-21T16:10:00.000Z",
                    "key": 1576944600000,
                    "doc_count": 6327
                }
            ]
        }
    }
}

I suggest to use min_doc_count to only include buckets that have data, ie the buckets with 0 documents would not come back in the response.

GET inc/_search
{
  "size": 0,
  "aggs": {
    "appli": {
      "date_histogram": {
        "field": "end_date_ut",
        "interval": "month",
        "min_doc_count": 1          <--- add this
      }
    }
  }
}

If you can, you can also add a range query in order to restrain the time interval on which the aggregation is run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM