简体   繁体   中英

How can I aggregate the whole field value in Elasticsearch

I am using Elasticsearch 7.15 and need to aggregate a field and sort them by order.

My document saved in Elasticsearch looks like:

{
  "logGroup" : "/aws/lambda/myLambda1",
  ...
},
{
  "logGroup" : "/aws/lambda/myLambda2",
  ...
}

I need to find out which logGroup has the most document. In order to do that, I tried to use aggregate in Elasticsearch:

GET /my-index/_search?size=0
{
  "aggs": {
    "types_count": {
      "terms": {
        "field": "logGroup",
        "size": 10000
      }
    }
  }
}

the output of this query looks like:

"aggregations" : {
    "types_count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "aws",
          "doc_count" : 26303620
        },
        {
          "key" : "lambda",
          "doc_count" : 25554470
        },
        {
          "key" : "myLambda1",
          "doc_count" : 25279201
        }
...
}

As you can see from above output, it splits the logGroup value into terms and aggregate based on term not the whole string. Is there a way for me to aggregate them as a whole string?

I expect the output looks like:

"buckets" : [
        {
          "key" : "/aws/lambda/myLambda1",
          "doc_count" : 26303620
        },
        {
          "key" : "/aws/lambda/myLambda2",
          "doc_count" : 25554470
        },

The logGroup field in the index mapping is:

"logGroup" : {
          "type" : "text",
          "fielddata" : true
        },

Can I achieve it without updating the index?

In order to get what you expect you need to change your mapping to this:

    "logGroup" : {
      "type" : "keyword"
    },

Failing to do that, your log groups will get analyzed by the standard analyzer which splits the whole string and you'll not be able to aggregate by full log groups.

If you don't want or can't change the mapping and reindex everything, what you can do is the following:

First, add a keyword sub-field to your mapping, like this:

PUT /my-index/_mapping
{
    "properties": {
        "logGroup" : {
            "type" : "text",
            "fields": {
                "keyword": {
                    "type" : "keyword"
                }
            }
        }
    }
}

And then run the following so that all existing documents pick up this new field:

POST my-index/_update_by_query?wait_for_completion=false

Finally, you'll be able to achieve what you want with the following query:

GET /my-index/_search
{
  "size": 0,
  "aggs": {
    "types_count": {
      "terms": {
        "field": "logGroup.keyword",
        "size": 10000
      }
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM