简体   繁体   English

Elasticsearch 复合查询和聚合结果不同的doc_count值

[英]Elasticsearch composite query and aggregation results different doc_count value

I am trying to query on my data set with composite query.我正在尝试使用复合查询来查询我的数据集。 Here is my这是我的

Query 1:查询一:

curl -X POST "localhost:9200/index1-202103/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "query":{
    "bool":{
      "filter":[
        {
          "range":{
            "date":{
              "gte":"20210330",
              "lte":"20210330"
            }
          }
        },
        {
          "term":{
            "userid":"16114"
          }
        },
        {
          "exists":{
            "field":"opens"
          }
        },
        {
          "exists":{
            "field":"tags"
          }
        }
      ]
    }
  },
  "aggs":{
    "my_buckets":{
      "composite":{
        "sources":[
          {
            "from_domain_wise":{
              "terms":{
                "field":"domain"
              }
            }
          },
          {
            "msp_wise":{
              "terms":{
                "field":"msp"
              }
            }
          },
          {
            "fromaddress_wise":{
              "terms":{
                "field":"fromaddress"
              }
            }
          },
          {
            "tag_wise":{
              "terms":{
                "field":"tags"
              }
            }
          },
          {
            "rate_over_time":{
              "date_histogram":{
                "field":"opens.time",
                "interval":"1h"
              }
            }
          }
        ]
      }
    }
  }
}'

Query 2查询 2

curl -X POST "localhost:9200/index1-202103/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "query":{
    "bool":{
      "filter":[
        {
          "range":{
            "date":{
              "gte":"20210330",
              "lte":"20210330"
            }
          }
        },
        {
          "term":{
            "userid":"16114"
          }
        },
        {
          "exists":{
            "field":"opens"
          }
        },
        {
          "exists":{
            "field":"tags"
          }
        }
      ]
    }
  },
  "aggs":{
    "my_buckets":{
      "composite":{
        "sources":[
          {
            "from_domain_wise":{
              "terms":{
                "field":"domain"
              }
            }
          },
          {
            "msp_wise":{
              "terms":{
                "field":"msp"
              }
            }
          },
          {
            "fromaddress_wise":{
              "terms":{
                "field":"fromaddress"
              }
            }
          },
          {
            "tag_wise":{
              "terms":{
                "field":"tags"
              }
            }
          }
        ]
      },
      "aggs":{
        "rate_over_time":{
          "date_histogram":{
            "field":"opens.time",
            "interval":"1h"
          }
        }
      }
    }
  }
}'

Both the results gives output for date histogram with different counts.这两个结果都为具有不同计数的日期直方图提供了 output。 When I checked, my findings were that Query1 is counting opens.time (FORMAT: 2021-03-30 15:15:45) fields duplicate values also whereas Query2 is counting opens.time only once if hour is same in single doc.当我检查时,我的发现是 Query1 正在计算 opens.time (FORMAT: 2021-03-30 15:15:45) 字段的重复值,而 Query2 仅在单个文档中的小时数相同时计算 opens.time 一次。

For example: if doc contains opens: [{ "time": "2021-03-30 15:20:25" }, { "time": "2021-03-30 15:45:30" }] then Query1 return doc_count as 2 where as Query2 returns doc_count as 1.例如:如果 doc 包含 opens: [{ "time": "2021-03-30 15:20:25" }, { "time": "2021-03-30 15:45:30" }]那么 Query1 返回doc_count为 2,其中 Query2 返回doc_count为 1。

Can anyone please explain why my query is behaving like this in spite of both the queries having the same goal.任何人都可以解释为什么我的查询会这样,尽管这两个查询具有相同的目标。 I want result which Query2 gives using Query1.我想要 Query2 使用 Query1 给出的结果。

PS: Elasticsearch version is 7.10 PS: Elasticsearch版本是7.10

Both queries do "have the goal" but notice where you apply the date_histogram :两个查询都“有目标”,但请注意您应用date_histogram的位置:

在此处输入图像描述

In the first query it's used as a composite sub-aggregation , in the second as a composite value source .在第一个查询中,它用作复合子聚合,在第二个查询中用作复合值源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM