簡體   English   中英

如何正確聚合字段是 Elasticsearch 上的一個列表

[英]How to correctly aggregate with the field is a list on Elasticsearch

目前 ES 日志的索引方式是某些字段具有列表而不是單個值。

例如:

_source:{
    "field1":"["item1", "item2", "item3"], 
    "field2":"something", 
    "field3": "something_else"
}

當然,列表的長度並不總是相同的。 我正在嘗試找到一種方法來匯總包含每個項目的日志數量(因此某些日志將被多次計算)

我知道我必須使用aggs ,但我怎樣才能形成正確的查詢(在-d之后)?

您可以使用以下使用terms aggregationtop_hits查詢。

{
"size": 0, 
"aggs": {
  "group": {
     "terms": {
        "script": "_source.field1.each{}"
     },
     "aggs":{
      "top_hits_log"   :{
       "top_hits"   :{
       }
      }
     }
    }       
   }
 }

輸出將是:

 "buckets": [
        {
           "key": "item1",
           "doc_count": 3,
           "top_hits_log": {
              "hits": {
                 "total": 3,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "1",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2",
                             "item3"
                          ],
                          "field2": "something1"
                       }
                    },
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "2",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1"
                          ],
                          "field2": "something2"
                       }
                    },
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "3",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2"
                          ],
                          "field2": "something3"
                       }
                    }
                 ]
              }
           }
        },
        {
           "key": "item2",
           "doc_count": 2,
           "top_hits_log": {
              "hits": {
                 "total": 2,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "1",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2",
                             "item3"
                          ],
                          "field2": "something1"
                       }
                    },
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "3",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2"
                          ],
                          "field2": "something3"
                       }
                    }
                 ]
              }
           }
        },
        {
           "key": "item3",
           "doc_count": 1,
           "top_hits_log": {
              "hits": {
                 "total": 1,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "1",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2",
                             "item3"
                          ],
                          "field2": "something1"
                       }
                    }
                 ]
              }
           }
        }
     ]

確保啟用dynamic scripting 設置script.disable_dynamic: false

希望這可以幫助。

無需使用scripting 尤其是_source解析會很慢。 您還需要確保您的field1not_analyzed否則您會得到奇怪的結果,因為在倒排索引中的唯一標記上執行terms aggregation

{
  "size": 0,
  "aggs": {
    "unique_items": {
      "terms": {
        "field": "field1",
        "size": 100
      },
      "aggs": {
        "documents": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

這里的大小是 100 內部terms aggregation ,根據您認為擁有的唯一值的數量進行更改(默認值為 10)。

希望這可以幫助!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM