简体   繁体   English

过滤结果以基于另一个字段值删除具有相同字段值的文档(不进行汇总)

[英]Filter results to remove documents with the same field value based on another field value (without aggregation)

Given the following 4 objects in an elasticsearch index: 给定Elasticsearch索引中的以下4个对象:

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "0:1",
    "_source": {
      "id": 0,
      "version": 1,
      "published": false,
      "latest": true
    }
  },
  {
    "_id": "1:0",
    "_source": {
      "id": 1,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I would like to find the documents using these rules: 我想使用以下规则查找文档:

  • with published:true published:true
  • no duplicate id 没有重复的id
  • for documents with the same id the highest version should be returned. 对于具有相同id的文档,应返回最高version

So for the above I'd like to get 0:0 and 1:1 : 所以对于以上我想得到0:01:1

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I'm aware that I can use top_hits , but I'd like to know if this is possible without it, such that the main hits.hits array will contain these results. 我知道我可以使用top_hits ,但是我想知道如果没有它,是否有可能,这样,主hits.hits数组将包含这些结果。

I'd probably do the collapsing as follows: 我可能会如下崩溃:

{ 
  query  : {...},
  aggs : {
    ids: {
      terms: {
          field: "id"
      },
      aggs:{
          dedup:{
            top_hits:{ size:1, sort: {version : 'desc'} }
          }
        }    
    }
  }
}

The reason I'm hoping to avoid using top_hits is that I'll need to update the result parser in our application. 我希望避免使用top_hits的原因是,我需要在应用程序中更新结果解析器。 Also the size field will not work correctly if I do so. 如果我这样做, size字段也将无法正常工作。

To answer my own question based on this answer , it's not possible without using the top_hits aggregation. 根据此答案回答我自己的问题,不使用top_hits聚合是top_hits I think what I was trying to achieve wasn't the best use of aggregation. 我认为我试图实现的不是最佳使用聚合。 Instead I'm going to adjust the index model by adding latestPublished true to the relevant models, allowing the query to be { term: { latestPublished: true}} . 取而代之的是,我将通过在相关模型中添加latestPublished true来调整索引模型,从而使查询成为{ term: { latestPublished: true}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM