简体   繁体   English

Elasticsearch - 根据字段中的唯一值获取聚合数据

[英]Elasticsearch - getting aggregated data based on unique values from field

In my elasticsearch (7.13) index, I have the following dataset:在我的 elasticsearch (7.13) 索引中,我有以下数据集:

maid      site_id    date         hour
m1        1300       2021-06-03   1
m1        1300       2021-06-03   2
m1        1300       2021-06-03   1
m2        1300       2021-06-03   1

I am trying to get unique count of records for each date and site_id from the above table.我正在尝试从上表中获取每个日期和 site_id 的唯一记录数。 The desired result is期望的结果是

maid      site_id   date        count        
m1        1300      2021-06-03  1
m2        1300      2021-06-03  1

I have millions of maid for each site_id and the dates spans across two years.每个 site_id 我都有数百万个女佣,日期跨越两年。 I am using the following code with cardinality on maid assuming that it will return the unique maid's.我在 maid 上使用以下具有cardinality的代码,假设它将返回唯一的女仆。

GET /r_2332/_search
{
  "size":0,
  "aggs": {
    "site_id": {
      "terms": {
        "field": "site_id",
        "size":100,
        "include": [
          1171, 1048
        ]
      },"aggs" : {
          "bydate" : {
            "range" : {
              "field": "date","ranges" : [
                {
                  "from": "2021-04-08",
                  "to": "2021-04-22" 
                }
                ]
            },"aggs" : {
              "rdate" : {
                "terms" : {
                  "field":"date" 
                },"aggs" :{
                  "maids" : {
                    "cardinality": {
                      "field": "maid"
                    }
                  }
              } 
            } 
          } 
        }
      }
    }
  }
}

This still returns the data with all the duplicate values.这仍然返回具有所有重复值的数据。 How do I include maid field into my query where I get the data filtered on unique maid values.如何将 maid 字段包含到我的查询中,以获取根据唯一 maid 值过滤的数据。

You can use multi terms aggregation along with cardinality aggregation if you want to get unique documents based on site_id and maid如果要基于site_idmaid获取唯一文档,可以使用多术语聚合基数聚合

    {
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "site_id": [
              "1300",
              "1301"
            ]
          }
        },
        {
          "range": {
            "date": {
              "gte": "2021-06-02",
              "lte": "2021-06-03"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by": {
      "multi_terms": {
        "terms": [
          {
            "field": "site_id"
          },
          {
            "field": "maid.keyword"
          }
        ]
      },
      "aggs": {
        "type_count": {
          "cardinality": {
            "field": "site_id"
          }
        }
      }
    }
  }
}

Search Result will be搜索结果将是

"aggregations": {
    "group_by": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": [
            1300,
            "m1"
          ],
          "key_as_string": "1300|m1",
          "doc_count": 3,
          "type_count": {
            "value": 1           // note this
          }
        },
        {
          "key": [
            1300,
            "m2"
          ],
          "key_as_string": "1300|m2",
          "doc_count": 1,
          "type_count": {
            "value": 1            // note this
          }
        }
      ]
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM