[英]Elasticsearch aggregations filtering by top one result from each bucket
[英]ElasticSearch get top result in combination bucket aggregator
我有一个索引,其中每个文档都是文件中的一行,并附有大量元数据,例如项目和作者等。
我已经编写了一个 Elastic Search 查询,它可以将文件聚合在一起,但是由于我们有文件的版本,所以我还必须按时间聚合,我无法锻炼如何添加一个 top_hits 聚合器来给我文件/最新版本的路径名。 有任何想法吗?
{
"query": {
"query_string": {
"query": "swan heights"
}
},
"aggs": {
"files_bucket": {
"composite": {
"sources": [
{
"path": {
"terms": {
"field": "path"
}
}
},
{
"timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1s",
"format": "iso8601"
}
}
}
]
}
}
},
"size": 1
}
目前返回:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 71,
"relation": "eq"
},
"max_score": 0.013937337,
"hits": [
{
"_index": "lines",
"_type": "_doc",
"_id": "asdfefwfad",
"_score": 0.013937337,
"_source": {
"projectId": 680,
"projectName": "swan heights",
...
}
}
]
},
"aggregations": {
"files_bucket": {
"after_key": {
"path": "ducks.txt",
"timestamp": "2021-02-26T12:08:20.000Z"
},
"buckets": [
{
"key": {
"path": "swans.txt",
"timestamp": "2021-02-25T12:10:43.000Z"
},
"doc_count": 17
},
{
"key": {
"path": "ducks.txt",
"timestamp": "2021-02-25T12:13:43.000Z"
},
"doc_count": 27
},
{
"key": {
"path": "ducks.txt",
"timestamp": "2021-02-26T12:08:20.000Z"
},
"doc_count": 27
}
]
}
}
}
我只想要每个文件的最新版本而不是两个版本的 ducks.txt
复合聚合可以接受子聚合。 所以试试这个:
{
"query": {
"query_string": {
"query": "swan heights"
}
},
"aggs": {
"files_bucket": {
"composite": {
"sources": [
{
"path": {
"terms": {
"field": "path"
}
}
},
{
"timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1s",
"format": "iso8601"
}
}
}
]
},
"aggs": {
"top_hits_agg": {
"top_hits": {
"size": 10,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
},
"size": 1
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.