简体   繁体   中英

How do I aggregate slightly different data in Elasticsearch?

There is a request with which you can calculate the percentiles of the request duration to the endpoint /api/v1/blabla

    POST /filebeat-nginx-*/_search
    {
      "aggs": {
        "hosts": {
          "terms": {
            "field": "host.name",
            "size": 1000
          },
          "aggs": {
            "url": {
              "terms": {
                "field": "nginx.access.url",
                "size": 1000
              },
              "aggs": {
                "time_duration_percentiles": {
                  "percentiles": {
                    "field": "nginx.access.time_duration",
                    "percents": [
                      50,
                      90
                    ],
                    "keyed": true
                  }
                }
              }
            }
          }
        }
      },
      "size": 0,
      "query": {
        "bool": {
          "filter": [
            {
              "bool": {
                "should": [
                  {
                    "prefix": {
                      "nginx.access.url": "/api/v1/blabla" 
                    }
                  }
                ]
              }
            },
            {
              "range": {
                "@timestamp": {
                  "gte": "now-10m",
                  "lte": "now" 
                }
              }
            }
          ]
        }
      }
    }

There is a problem with the fact that some arguments are also passed to this endpoint, for example /api/v1/blabla?Lang=en&type=active , or /api/v1/blabla/?Lang=en&type=istory , etc. Accordingly, the answer shows the percentiles for each such "separate" endpoint:

    {
      "key" : "/api/v1/blabla?lang=ru",
      "doc_count" : 423,
      "time_duration_percentiles" : {
        "values" : {
          "50.0" : 0.21199999749660492,
          "90.0" : 0.29839999079704277
        }
      }
    },
    {
      "key" : "/api/v1/blabla?lang=en&type=active",
      "doc_count" : 31,
      "time_duration_percentiles" : {
        "values" : {
          "50.0" : 0.21699999272823334,
          "90.0" : 0.2510000020265579
        }
      }
    },
    {
      "key" : "/api/v1/blabla?lang=en",
      "doc_count" : 4,
      "time_duration_percentiles" : {
        "values" : {
          "50.0" : 0.22700000554323196,
          "90.0" : 0.24899999797344208
        }
      }
    }

Please tell me is it possible to somehow aggregate similar endpoints into only one /api/v1/blabla and get the general percentile?

Like this:

    {
      "key" : "/api/v1/blabla",
      "doc_count" : 4,
      "time_duration_percentiles" : {
        "values" : {
          "50.0" : 0.22700000554323196,
          "90.0" : 0.24899999797344208
        }
      }
    }

You could try splitting the nginx.access.url in a script but keep in mind that it'll probably be slow:

{
  "aggs": {
    "hosts": {
      "terms": {
        "field": "host.name",
        "size": 1000
      },
      "aggs": {
        "url": {
          "terms": {
            "script": {
              "source": "/\\?/.split(doc['nginx.access.url'].value)[0]"       <--- here
            }, 
            "size": 1000
          },
          "aggs": {
            "time_duration_percentiles": {
              "percentiles": {
                "field": "nginx.access.time_duration",
                "percents": [
                  50,
                  90
                ],
                "keyed": true
              }
            }
          }
        }
      }
    }
  },
  ...
}

BTW it's good practice to extract the URI hostname, path, query string etc. before you index your docs. You can do so through the URI parts pipeline and other mechanisms.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM