Elasticsearch post_filter聚合查询

Question

我对所有没有返回200个响应（在特定时间间隔内）的API感兴趣。

我基本上需要这个：

     select url from api_log
      except/minus 
     select url from api_log where status='200'

转换为ES，我正在尝试执行以下操作：

首先计算总量。

     select url, status, count(*) from api_log
     group by url, status

从随后的结果中，筛选出所有具有以下子状态的记录：200

ES样本数据

{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "1",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:20:51.108945",
        "out_time": "2019-05-13T17:20:51.145549",
        "duration": 36.6041660308838,
        "status": "200",
        "url": "/api/myFirstAPI"
    }
}
,
{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "2",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:20:57.915694",
        "out_time": "2019-05-13T17:20:57.941989",
        "duration": 26.2949466705322,
        "status": "403",
        "url": "/api/mySecondAPI"
    }
},
{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "3",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:22:35.274372",
        "out_time": "2019-05-13T17:22:35.288944",
        "duration": 14.5719051361084,
        "status": "400",
        "url": "/api/myFirstAPI"
    }
}

对于以上数据，我希望结果网址为{'/ api / mySecondAPI'}。

仅使用AGG进行请求/响应

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "url": {
      "terms": {
    "field": "url.keyword"
      },
      "aggregations": {
    "status": {
      "terms": {
        "field": "status.keyword"
      }
    }
      }
    }
  }
}

以上要求的回应

{
  "took" : 880,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "url" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 394668,
      "buckets" : [
        {
          "key" : "/api/myFirstRequest",
          "doc_count" : 1352845,
          "status" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "200",
                "doc_count" : 1187611
              },
              {
                "key" : "302",
                "doc_count" : 139932
              },
              {
                "key" : "401",
                "doc_count" : 22615
              },
              {
                "key" : "500",
                "doc_count" : 2250
              },
              {
                "key" : "403",
                "doc_count" : 437
              }
            ]
          }
        },
...
...
...

从上面我需要过滤掉所有不包含状态为“ 200”的子存储桶的存储桶（URL）

我走了这么远。 看起来很近，但又很远。...似乎无法弄清楚类型字段中应该包含什么。

带过滤器的要求

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "page_name": {
      "terms": {
        "field": "url.keyword"
      },
      "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword"
          }
        }
      }
    }
  },
   "post_filter": {
      "bool": {
        "must_not": [
            {
                "has_child" : {
                    "type" : "?????",
                    "query" : {
                        "term" : {"status" : "200"}
                    }
                }
            }
        ]
      }
    }
}

示例输入（来自apache日志）：

t1 /api/FirstAPI 200  <-- Eliminate First API completely
t2 /api/FirstAPI 400
t3 /api/FirstAPI 403
t4 /api/SecondAPI 403
t5 /api/SecondAPI 400
t6 /api/ThirdAPI 500
t7 /api/ThirdAPI 500
t8 /api/SecondAPI 200   <---Eliminate Second API completely
t9 /api/ThirdAPI 500
t10 /api/ThirdAPI 403

给定以上输入，我只希望在时间范围t1-t10中从未给出200响应的页面。

预期结果

因此，输出应仅为/ api / ThirdAPI

如果我先过滤掉200个，然后再应用Agg，我将获得全部三个API。 那不是我想要的

Answer 1

如果我理解正确，那么您只想从聚合中排除200。 我在这里看不到使用post_filter的理由。 您可以使用术语聚合 。

在聚合中排除或过滤状态值。 这将计算所有200响应，并在doc_count字段中添加，但将在聚合响应中排除存储桶，并且不会显示200

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "url": {
      "terms": {
        "field": "url.keyword"
      },
      "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword",
            "exclude": "200"
          }
        }
      }
    }
  }
}

选择：

根据您的输入，您似乎希望将200作为结果集的一部分（因为您正在使用post_filter），但是如果没有，那么就不是这种情况了。 汇总是在查询响应上完成的； 因此，如果您使用布尔查询从结果集中排除200个桶，则不会有任何状态为200的存储桶。

POST /api_log/_search
    {
      "size": 0,
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "status": [
                  "200"
                ]
              }
            }
          ]
        }
      }, 
      "aggs": {
        "url": {
          "terms": {
            "field": "url.keyword"
          },
          "aggregations": {
            "status": {
              "terms": {
                "field": "status.keyword"
              }
            }
          }
        }
      }
    }

Elasticsearch post_filter聚合查询

问题描述

1 个解决方案

解决方案1
0 2019-06-25 18:47:48

Elasticsearch post_filter聚合查询

问题描述

1 个解决方案

解决方案1 0 2019-06-25 18:47:48

解决方案1
0 2019-06-25 18:47:48