过滤结果以基于另一个字段值删除具有相同字段值的文档（不进行汇总）

Question

Given the following 4 objects in an elasticsearch index: 给定Elasticsearch索引中的以下4个对象：

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "0:1",
    "_source": {
      "id": 0,
      "version": 1,
      "published": false,
      "latest": true
    }
  },
  {
    "_id": "1:0",
    "_source": {
      "id": 1,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I would like to find the documents using these rules: 我想使用以下规则查找文档：

with published:true 与published:true
no duplicate id 没有重复的id
for documents with the same id the highest version should be returned. 对于具有相同id的文档，应返回最高version 。

So for the above I'd like to get 0:0 and 1:1 : 所以对于以上我想得到0:0和1:1 ：

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I'm aware that I can use top_hits , but I'd like to know if this is possible without it, such that the main hits.hits array will contain these results. 我知道我可以使用top_hits ，但是我想知道如果没有它，是否有可能，这样，主hits.hits数组将包含这些结果。

I'd probably do the collapsing as follows: 我可能会如下崩溃：

{ 
  query  : {...},
  aggs : {
    ids: {
      terms: {
          field: "id"
      },
      aggs:{
          dedup:{
            top_hits:{ size:1, sort: {version : 'desc'} }
          }
        }    
    }
  }
}

The reason I'm hoping to avoid using top_hits is that I'll need to update the result parser in our application. 我希望避免使用top_hits的原因是，我需要在应用程序中更新结果解析器。 Also the size field will not work correctly if I do so. 如果我这样做， size字段也将无法正常工作。

Answer 1

To answer my own question based on this answer , it's not possible without using the top_hits aggregation. 要根据此答案回答我自己的问题，不使用top_hits聚合是top_hits 。 I think what I was trying to achieve wasn't the best use of aggregation. 我认为我试图实现的不是最佳使用聚合。 Instead I'm going to adjust the index model by adding latestPublished true to the relevant models, allowing the query to be { term: { latestPublished: true}} . 取而代之的是，我将通过在相关模型中添加latestPublished true来调整索引模型，从而使查询成为{ term: { latestPublished: true}} 。

过滤结果以基于另一个字段值删除具有相同字段值的文档（不进行汇总）

问题描述

1 个解决方案

解决方案1
0 2016-06-09 09:37:45

过滤结果以基于另一个字段值删除具有相同字段值的文档（不进行汇总）

问题描述

1 个解决方案

解决方案1 0 2016-06-09 09:37:45

解决方案1
0 2016-06-09 09:37:45