繁体   English   中英

ElasticSearch 在组合桶聚合器中获得最佳结果

[英]ElasticSearch get top result in combination bucket aggregator

我有一个索引,其中每个文档都是文件中的一行,并附有大量元数据,例如项目和作者等。

我已经编写了一个 Elastic Search 查询,它可以将文件聚合在一起,但是由于我们有文件的版本,所以我还必须按时间聚合,我无法锻炼如何添加一个 top_hits 聚合器来给我文件/最新版本的路径名。 有任何想法吗?

{
    "query": {
        "query_string": {
            "query": "swan heights"
        }
    },
    "aggs": {
        "files_bucket": {
            "composite": {
                "sources": [
                    {
                        "path": {
                            "terms": {
                                "field": "path"
                            }
                        }
                    },
                    {
                        "timestamp": {
                            "date_histogram": {
                                "field": "timestamp",
                                "calendar_interval": "1s",
                                "format": "iso8601"
                            }
                        }
                    }
                ]
            }
        }
    },
    "size": 1
}

目前返回:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 71,
            "relation": "eq"
        },
        "max_score": 0.013937337,
        "hits": [
            {
                "_index": "lines",
                "_type": "_doc",
                "_id": "asdfefwfad",
                "_score": 0.013937337,
                "_source": {
                    "projectId": 680,
                    "projectName": "swan heights",
                    ...
                    
                }
            }
        ]
    },
    "aggregations": {
        "files_bucket": {
            "after_key": {
                "path": "ducks.txt",
                "timestamp": "2021-02-26T12:08:20.000Z"
            },
            "buckets": [
                {
                    "key": {
                        "path": "swans.txt",
                        "timestamp": "2021-02-25T12:10:43.000Z"
                    },
                    "doc_count": 17
                },
                {
                    "key": {
                        "path": "ducks.txt",
                        "timestamp": "2021-02-25T12:13:43.000Z"
                    },
                    "doc_count": 27
                },
                {
                    "key": {
                        "path": "ducks.txt",
                        "timestamp": "2021-02-26T12:08:20.000Z"
                    },
                    "doc_count": 27
                }
            ]
        }
    }
}

我只想要每个文件的最新版本而不是两个版本的 ducks.txt

复合聚合可以接受子聚合 所以试试这个:

{
  "query": {
    "query_string": {
      "query": "swan heights"
    }
  },
  "aggs": {
    "files_bucket": {
      "composite": {
        "sources": [
          {
            "path": {
              "terms": {
                "field": "path"
              }
            }
          },
          {
            "timestamp": {
              "date_histogram": {
                "field": "timestamp",
                "calendar_interval": "1s",
                "format": "iso8601"
              }
            }
          }
        ]
      },
      "aggs": {
        "top_hits_agg": {
          "top_hits": {
            "size": 10,
            "sort": [
              {
                "timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  },
  "size": 1
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM