简体   繁体   English

带有date_histogram的Elasticsearch聚合为存储桶提供了错误的结果

[英]Elasticsearch aggregation with date_histogram gives wrong result for buckets

I have data with timestamp. 我有带时间戳的数据。 I want to do date_histogram on that. 我想在此做date_histogram。

When I run the query it return total as 13 which is correct, but it shows one record in 2014-10-10 , but I cant find that record in data I have. 当我运行查询时,它返回的总值为13,这是正确的,但是它在2014-10-10显示了一条记录,但是我在我的data找不到该记录。

curl http://localhost:9200/test/test/_search -X POST -d '{"fields":
 ["creation_time"],
  "query" :
      {"filtered":
          {"query":
              {"match":
                  {"type": "test.type"}
              }
          }
      },
  "aggs":
      {"group_by_created_by":
          {"date_histogram":
              {"field":"creation_time", "interval": "1d"}
          }
      }
 }' | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2083  100  1733  100   350   234k  48590 --:--:-- --:--:-- --:--:--  241k
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "aggregations": {
        "group_by_created_at": {
            "buckets": [
                {
                    "doc_count": 12,
                    "key": 1412812800000,
                    "key_as_string": "2014-10-09T00:00:00.000Z"
                },
                {
                    "doc_count": 1,
                    "key": 1412899200000,
                    "key_as_string": "2014-10-10T00:00:00.000Z"
                }
            ]
        }
    },
    "hits": {
        "hits": [
            {
                "_id": "qk5EGDqUSoW-ckZU9bnSsA",
                "_index": "test",
                "_score": 3.730029,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T16:35:39.535389"
                    ]
                }
            },
            {
                "_id": "GnglI_3xRYii_oE5q91FUg",
                "_index": "test",
                "_score": 3.6149597,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T17:16:55.677919"
                    ]
                }
            },
            {
                "_id": "ELP1f_-IS8SJiT4i4Vh6_g",
                "_index": "test",
                "_score": 2.974081,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T01:21:21.691270"
                    ]
                }
            },
            {
                "_id": "ySlIV4vWRvm_q0-9p87dEQ",
                "_index": "test",
                "_score": 2.974081,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T01:33:51.291644"
                    ]
                }
            },
            {
                "_id": "swXVnMmJSsmNW30zeJvCoQ",
                "_index": "test",
                "_score": 2.974081,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T17:08:45.738821"
                    ]
                }
            },
            {
                "_id": "h0j6L-VGTnyChSIevtt2og",
                "_index": "test",
                "_score": 2.974081,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T22:35:16.908080"
                    ]
                }
            },
            {
                "_id": "ANoTEXIgRgml6gLD4YKtIg",
                "_index": "test",
                "_score": 2.9459102,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T01:25:18.869175"
                    ]
                }
            },
            {
                "_id": "FSCPBsogT5OXghBUmKXidQ",
                "_index": "test",
                "_score": 2.9459102,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T01:42:49.000599"
                    ]
                }
            },
            {
                "_id": "VEw6XbIySvW7h7GF7h4ynA",
                "_index": "test",
                "_score": 2.9459102,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T16:45:51.563595"
                    ]
                }
            },
            {
                "_id": "J9NfffAvRPmFxtOBZ6IsCA",
                "_index": "test",
                "_score": 2.9169223,
                "_type": "test",
                "fields": {
                    "creation_time": [
                        "2014-10-09T01:23:30.546353"
                    ]
                }
            }
        ],
        "max_score": 3.730029,
        "total": 13
    },
    "timed_out": false,
    "took": 4
}

If you see the above examples, then there is no record on 10-10 but aggregation shows one record in that bucket. 如果您看到上述示例,则10-10上没有记录,但聚合显示该存储桶中的一条记录。

If you count your hits, you will see there are only 10 objects. 如果计算点击数,您将看到只有10个对象。 This is because, by default, Elasticsearch will return only the top ten result hits . 这是因为,默认情况下,Elasticsearch将仅返回前十个结果匹配

However, even if not present in the hits , all the documents matching the query are taken into account when computing your aggregations. 但是,即使hits不存在,在计算汇总时也会考虑所有与查询匹配的文档。

Try to update your query to : 尝试将查询更新为:

{
  "size": 13,
  "fields": ["creation_time"],
  "query" :
      {"filtered":
          {"query":
              {"match":
                  {"type": "test.type"}
              }
          }
      },
  "aggs":
      {"group_by_created_by":
          {"date_histogram":
              {"field":"creation_time", "interval": "1d"}
          }
      }
 }

And you will see the document which has been created on the 10-10. 您将看到在10-10上创建的文档。

Aggregations are done on all matching documents. 对所有匹配的文档进行汇总。

You do not set the size which means you the default 10 documents under hits. 您没有设置size ,这意味着您命中默认的10个文档。 Change the size to 13(+) and your 2014-10-10 document should show. size更改为13(+),然后应显示2014-10-10文档。

When you have more results, which will make it unhandy to manually check all results, you can also use top_hits as a sub-aggregator to get a peak of what is in the bucket (there's a size option there as well). 当您获得更多结果时,手动检查所有结果将变得top_hits ,您还可以将top_hits用作子聚合器,以获取存储桶中存储内容的峰值(那里也有一个size选项)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM