简体   繁体   中英

ElasticSearch: Query to find max of count of objects based on field value

For the example document below in the index, I want to find max of count of actions based on component name across all documents in the index. Could you please help to find a way for this.

Expected result assuming only one document present in the Index:

comp1 -> action1 -> max 2 times
comp1 -> action2 -> max 1 time
comp2 -> action2 -> max 1 time
comp2 -> action3 -> max 1 time

Sample Document:

{
  "id": "AC103902:A13A_AC140008:01BB_5FA2E8FA_1C08:0007",
  "tokens": [
    {
      "name": "comp1",
      "items": [
        {
          "action": "action1",
          "attr": "value"
        },
        {
          "action": "action1",
          "attr": "value"
        },
        {
          "action": "action2",
          "attr": "value"
        }
      ]
    },
    {
      "name": "comp2",
      "items": [
        {
          "action": "action2",
          "attr": "value"
        },
        {
          "action": "action3",
          "attr": "value"
        }
      ]
    }
  ]
}

ElasticSearch Version: 7.9 I can loop through each document and calculate this at client side but I am curious to know if there is already an ES query which can help to get this kid of summary from the documents in the index.

You'll need to define both the tokens array and the tokens.items array as nested in order to get the correct stats.

Then, assuming your mapping looks something along the lines of

{
  "mappings": {
    "properties": {
      "tokens": {
        "type": "nested",
        "properties": {
          "items": {
            "type": "nested"
          }
        }
      }
    }
  }
}

the following query can be executed:

GET index_name/_search
{
  "size": 0,
  "aggs": {
    "by_token_name": {
      "nested": {
        "path": "tokens"
      },
      "aggs": {
        "token_name": {
          "terms": {
            "field": "tokens.name.keyword"
          },
          "aggs": {
            "by_max_actions": {
              "nested": {
                "path": "tokens.items"
              },
              "aggs": {
                "max_actions": {
                  "terms": {
                    "field": "tokens.items.action.keyword"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

yielding these buckets:

[
  {
    "key" : "comp1",              <--
    "doc_count" : 1,
    "by_max_actions" : {
      "doc_count" : 3,
      "max_actions" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "action1",    <--
            "doc_count" : 2
          },
          {
            "key" : "action2",    <--
            "doc_count" : 1
          }
        ]
      }
    }
  },
  {
    "key" : "comp2",              <--
    "doc_count" : 1,
    "by_max_actions" : {
      "doc_count" : 2,
      "max_actions" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "action2",    <--
            "doc_count" : 1
          },
          {
            "key" : "action3",    <--
            "doc_count" : 1
          }
        ]
      }
    }
  }
]

which can be easily post-processed at client side.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM