简体   繁体   English

Elassandra / Elastic Search中的聚合,日期范围查询

[英]Aggregation, Date range query in Elassandra/Elastic Search

Getting different results while searching on the date range aggregation indexing. 在搜索日期范围聚合索引时获得不同的结果。

Created the index like below. 如下创建索引。

curl -XPUT -H 'Content-Type: application/json' 'http://x.x.x.x:9200/date_index' -d '{
  "settings" : { "keyspace" : "keyspace1"},
  "mappings" : {
    "table1" : {
      "discover":"sent_date",
      "properties" : {
        "sent_date" : { "type": "date", "format": "yyyy-MM-dd HH:mm:ssZZ" }
        }
    }
  }
}'

When trying searching with below code, i am getting different date range results. 当尝试使用以下代码搜索时,我得到了不同的日期范围结果。

    curl -XGET -H 'Content-Type: application/json' 'http://x.x.x.x:9200/date_index/_search?pretty=true' -d '
    {
      "aggs" : {
        "sentdate_range_search" : {
          "date_range" : {
            "field" : "sent_date",
            "time_zone": "UTC",
            "format" : "yyyy-MM-dd HH:mm:ssZZ",
            "ranges" : [
              { "from" : "2010-05-07 11:22:34+0000", "to" : "2011-05-07 11:22:34+0000"}
            ]
      }
    }
  }
}'

Sample output, showing different results like 2039, 2024 etc. 样本输出,显示不同的结果,例如2039、2024等。

{
  "took" : 26,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 417427,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "date_index",
        "_type" : "table1",
        "_id" : "P89200822_4210021505784",
        "_score" : 1.0,
        "_source" : {
          "sent_date" : "2039-05-22T14:45:39.000Z"
        }
      },
      {
        "_index" : "date_index",
        "_type" : "table1",
        "_id" : "P89200605_4210020537428",
        "_score" : 1.0,
        "_source" : {
           "sent_date" : "2024-06-05T07:20:57.000Z"
        }
      },
      .........
    "aggregations" : {
    "sentdate_range_search" : {
      "buckets" : [
        {
          "key" : "2010-05-07 11:22:34+00:00-2011-05-07 11:22:34+00:00",
          "from" : 1.273231354E12,
          "from_as_string" : "2010-05-07 11:22:34+00:00",
          "to" : 1.304767354E12,
          "to_as_string" : "2011-05-07 11:22:34+00:00",
          "doc_count" : 0
         }
      ]
    }
  }

FYI: I am using the data that was resided in Cassandra Database where the field "sent_date" is stored with UTC timezone. 仅供参考:我使用的是Cassandra数据库中存储的数据,其中“ sent_date”字段与UTC时区一起存储。

Please advise, thanks 请指教,谢谢

== Reworked answer based on conversation in the comments == ==根据评论中的对话重做的答案==

Aggregations are different than search queries. 汇总与搜索查询不同。 Aggregations combine records (ie aggregate!) along specified dimensions. 汇总沿指定维度合并记录(即汇总!)。 The query in the question aggregates records that fall between the two specified dates into a single bucket. 问题中的查询将两个指定日期之间的记录聚合到一个存储桶中。 More info on aggregations can be found in the Elasticsearch documentation 可以在Elasticsearch文档中找到有关聚合的更多信息。

Since the requirement is to filter records that fall between two dates, a date range filter is the appropriate approach: 由于要求是过滤介于两个日期之间的记录,因此日期范围过滤器是合适的方法:

GET date_index/_search
{
   "query": {
       "bool": {
           "filter": {
               "range": {
                   "sent_date": {
                       "gte": "2010-05-07 11:22:34+0000",
                       "lte": "2011-05-07 11:22:34+0000"
                   }
               }
            }
        }
    }
}

Why filter instead of regular query? 为什么要过滤而不是常规查询? Filters are faster than searches because they don't contribute to document scoring and they're cacheable. 筛选器比搜索速度快,因为它们不会有助于文档评分,并且可以缓存。 You can combine filters and searches to, for example, get all records within the given time range that match the phrase "all work and no play makes jack a dull boy." 您可以结合使用过滤器和搜索功能,例如,获取给定时间范围内与短语“所有工作无济于事,使杰克成为一个愚蠢的男孩”相匹配的所有记录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM