简体   繁体   中英

Elasticsearch aggregations filtered result is not working properly

  1. two sample documents

POST /aggstest/test/1

{
    "categories": [
        {
            "type": "book",
            "words": [
                {"word":"storm","count":277},
                {"word":"pooh","count":229}
            ]
        },
        {
            "type": "magazine",
            "words": [
                {"word":"vibe","count":100},
                {"word":"sunny","count":50}
            ]
        }
    ]
}

POST /aggstest/test/2

{
    "categories": [
        {
            "type": "book",
            "words": [
                {"word":"rain","count":160},
                {"word":"jurassic park","count":150}
            ]
        },
        {
            "type": "megazine",
            "words": [
                {"word":"tech","count":200},
                {"word":"homes","count":30}
            ]
        }
    ]
}
  1. aggs query

GET /aggstest/test/_search

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "categories.type": "book"
              }
            },
            {
              "term": {
                "categories.words.word": "storm"
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "categories.type": "book"
              }
            }
          ]
        }
      },
      "aggs": {
        "book_category": {
          "terms": {
            "field": "categories.words.word",
            "size": 10
          }
        }
      }
    }
  },
  "post_filter": {
    "term": {
      "categories.type": "book"
    }
  }
}
  1. result

     { "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "filtered": { "doc_count": 1, "book_category": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "pooh", "doc_count": 1 }, { "key": "storm", "doc_count": 1 }, { "key": "sunny", "doc_count": 1 }, { "key": "vibe", "doc_count": 1 } ] } } } } 

========================

Expected aggs result set should not include "sunny" and "vibe" because it's "magazine" type.

I used filter query and post_filter, but I couldn't get only "book" type aggs result.

All the filters you apply (in-query and in-aggregation) still return the whole categories document. And this document, which contains all 4 words, is a scope for aggregation. Hence you always get all 4 buckets. As far as I understand, some way to manipulate buckets on server-side would be introduced with reducers in version 2.0 of Elasticsearch.

What you may use now is changing the mapping so that categories is nested object . Hence you'll be able to query them independently and aggregate accordingly using nested aggregation. Changing object type to nested requires reindexing.

Also please note that post-filters are not applied to aggregation whatsoever. They are used to filter the original query without affecting the aggregation when you need to aggregate on wider scope than returned hits.

And one more thing, if you already have filter in query there's no need to put it in aggregation, scope is already filtered.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM