简体   繁体   中英

Filter results to remove documents with the same field value based on another field value (without aggregation)

Given the following 4 objects in an elasticsearch index:

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "0:1",
    "_source": {
      "id": 0,
      "version": 1,
      "published": false,
      "latest": true
    }
  },
  {
    "_id": "1:0",
    "_source": {
      "id": 1,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I would like to find the documents using these rules:

  • with published:true
  • no duplicate id
  • for documents with the same id the highest version should be returned.

So for the above I'd like to get 0:0 and 1:1 :

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I'm aware that I can use top_hits , but I'd like to know if this is possible without it, such that the main hits.hits array will contain these results.

I'd probably do the collapsing as follows:

{ 
  query  : {...},
  aggs : {
    ids: {
      terms: {
          field: "id"
      },
      aggs:{
          dedup:{
            top_hits:{ size:1, sort: {version : 'desc'} }
          }
        }    
    }
  }
}

The reason I'm hoping to avoid using top_hits is that I'll need to update the result parser in our application. Also the size field will not work correctly if I do so.

To answer my own question based on this answer , it's not possible without using the top_hits aggregation. I think what I was trying to achieve wasn't the best use of aggregation. Instead I'm going to adjust the index model by adding latestPublished true to the relevant models, allowing the query to be { term: { latestPublished: true}} .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM