[英]Elasticsearch - getting aggregated data based on unique values from field
In my elasticsearch (7.13) index, I have the following dataset:在我的 elasticsearch (7.13) 索引中,我有以下数据集:
maid site_id date hour
m1 1300 2021-06-03 1
m1 1300 2021-06-03 2
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
I am trying to get unique count of records for each date and site_id from the above table.我正在尝试从上表中获取每个日期和 site_id 的唯一记录数。 The desired result is期望的结果是
maid site_id date count
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
I have millions of maid for each site_id and the dates spans across two years.每个 site_id 我都有数百万个女佣,日期跨越两年。 I am using the following code with cardinality
on maid assuming that it will return the unique maid's.我在 maid 上使用以下具有cardinality
的代码,假设它将返回唯一的女仆。
GET /r_2332/_search
{
"size":0,
"aggs": {
"site_id": {
"terms": {
"field": "site_id",
"size":100,
"include": [
1171, 1048
]
},"aggs" : {
"bydate" : {
"range" : {
"field": "date","ranges" : [
{
"from": "2021-04-08",
"to": "2021-04-22"
}
]
},"aggs" : {
"rdate" : {
"terms" : {
"field":"date"
},"aggs" :{
"maids" : {
"cardinality": {
"field": "maid"
}
}
}
}
}
}
}
}
}
}
This still returns the data with all the duplicate values.这仍然返回具有所有重复值的数据。 How do I include maid field into my query where I get the data filtered on unique maid values.如何将 maid 字段包含到我的查询中,以获取根据唯一 maid 值过滤的数据。
You can use multi terms aggregation along with cardinality aggregation if you want to get unique documents based on site_id
and maid
如果要基于site_id
和maid
获取唯一文档,可以使用多术语聚合和基数聚合
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"site_id": [
"1300",
"1301"
]
}
},
{
"range": {
"date": {
"gte": "2021-06-02",
"lte": "2021-06-03"
}
}
}
]
}
},
"aggs": {
"group_by": {
"multi_terms": {
"terms": [
{
"field": "site_id"
},
{
"field": "maid.keyword"
}
]
},
"aggs": {
"type_count": {
"cardinality": {
"field": "site_id"
}
}
}
}
}
}
Search Result will be搜索结果将是
"aggregations": {
"group_by": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
1300,
"m1"
],
"key_as_string": "1300|m1",
"doc_count": 3,
"type_count": {
"value": 1 // note this
}
},
{
"key": [
1300,
"m2"
],
"key_as_string": "1300|m2",
"doc_count": 1,
"type_count": {
"value": 1 // note this
}
}
]
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.