簡體   English   中英

Elasticsearch 聚合按每個桶的前一個結果過濾

[英]Elasticsearch aggregations filtering by top one result from each bucket

在 Elasticsearch 的單個索引中給定這樣的數據集:

entityId | created    | status
---------+------------+-----------
1        | 2000/01/01 | draft
1        | 2001/01/02 | approved
2        | 2000/01/01 | draft
2        | 2000/01/02 | approved
2        | 2001/01/03 | rejected
3        | 2000/01/01 | draft
3        | 2001/01/03 | approved

我只想過濾最新狀態被批准的實體。

因此,我一直在嘗試使用聚合和子聚合,並且設法將所有實體都包含在內,其中僅包含最新狀態,如下所示:

{
  "size": 0,
  "aggs": {
    "newest-event-query": {
      "terms": {
        "field": "entityId"
      },
      "aggs": {
        "newest-event": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "created": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

這應該給出這樣的結果:

entityId | created    | status
---------+------------+-----------
1        | 2001/01/02 | approved
2        | 2001/01/02 | rejected
3        | 2001/01/03 | approved

但我想進一步過濾該結果以僅包含批准的記錄 (1, 3),然后最終能夠查詢該結果。

向 top_hits aggs 添加額外的 aggs 似乎不起作用:

{
  "size": 0,
  "aggs": {
    "newest-event-query": {
      "terms": {
        "field": "entityId"
      },
      "aggs": {
        "newest-event": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "created": {
                  "order": "desc"
                }
              }
            ],
            "aggs": {
              "approved-only": {
                "filter": {
                  "term": {
                    "status": "approved"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

結果是:

"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[gupa9nwpQWmGa3JqFmF2NA][creations][0]: SearchParseException[[creations][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][0]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][0]: SearchParseException[[events][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][0]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][1]: SearchParseException[[creations][1]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][1]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][1]: SearchParseException[[events][1]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][1]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][2]: SearchParseException[[creations][2]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][2]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][2]: SearchParseException[[events][2]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][2]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][3]: SearchParseException[[creations][3]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][3]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][3]: SearchParseException[[events][3]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][3]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][4]: SearchParseException[[creations][4]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][4]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][4]: SearchParseException[[events][4]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][4]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }]",
"status": 400

任何幫助表示贊賞。

編輯:對已批准的過濾不起作用,因為事件可以從已批准並返回到另一個狀態。 我總是需要按最新狀態過濾。 這個練習的重點是創建一個不可變的數據結構——單個實體可以經歷很多階段,但我們應該始終只查詢最新的階段。

編輯 2:為了找到解決方案,我還查看了父子結構,雖然關閉它仍然有一些限制,比如 has_parent 或 has_child 需要有一個固定的“id”。 另一個明顯且高效的解決方案是在寫入時簡單標記最新項目 - 例如。 使用布爾值,但我想要原子性並在一個文檔上重置該布爾值並將其設置在新文檔上不是原子操作。

我使用了術語聚合存儲桶選擇器聚合 在一個術語下,我在創建日期字段上使用最大聚合創建了一個最近條目的存儲桶,以及一個狀態被批准的創建日期存儲桶“。使用存儲桶選擇器,我保留了最新日期和最新批准日期相同的條款

Entity: 1                                        --> using terms aggregation
     "Latest created date":"2001-01-02"          --> using max aggregation
     "Latest approved doc":                      --> using filter aggregation
            "Latest approved date":"2000-01-01"  --> Using max aggregation
     "Bucket where Latest created date==Latest approved doc>Latest approved date" 
                                                 --> using bucket selector aggregation

映射

{
  "index90" : {
    "mappings" : {
      "properties" : {
        "created" : {
          "type" : "date",
          "format" : "[yyyy-MM-dd]"
        },
        "entityId" : {
          "type" : "integer"
        },
        "status" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "text"
            }
          }
        }
      }
    }
  }
}

數據:

"hits" : [
      {
        "_index" : "index90",
        "_type" : "_doc",
        "_id" : "xZsmY3EBdTQt60iNXDQB",
        "_score" : 1.0,
        "_source" : {
          "entityId" : 1,
          "created" : "2000-01-01",
          "status" : "draft"
        }
      },
      {
        "_index" : "index90",
        "_type" : "_doc",
        "_id" : "xpsmY3EBdTQt60iNojQc",
        "_score" : 1.0,
        "_source" : {
          "entityId" : 1,
          "created" : "2001-01-02",
          "status" : "approved"
        }
      },
      {
        "_index" : "index90",
        "_type" : "_doc",
        "_id" : "x5smY3EBdTQt60iN7DQc",
        "_score" : 1.0,
        "_source" : {
          "entityId" : 2,
          "created" : "2000-01-01",
          "status" : "draft"
        }
      },
      {
        "_index" : "index90",
        "_type" : "_doc",
        "_id" : "yJsnY3EBdTQt60iNAzT7",
        "_score" : 1.0,
        "_source" : {
          "entityId" : 2,
          "created" : "2000-01-02",
          "status" : "approved"
        }
      },
      {
        "_index" : "index90",
        "_type" : "_doc",
        "_id" : "yZsnY3EBdTQt60iNIjQY",
        "_score" : 1.0,
        "_source" : {
          "entityId" : 2,
          "created" : "2000-01-03",
          "status" : "rejected"
        }
      }
    ]

詢問:

{
 "aggs": {
   "entitites": {
     "terms": {
       "field": "entityId",
       "size": 10
     },
     "aggs": {
       "latest_entry": {
         "max": {
           "field": "created"
         }
       },
       "latest_approved_entry":{
         "filter": {
           "term": {
             "status.keyword": "approved"
           }
         },
         "aggs": {
           "approved_date": {
             "max": {
               "field": "created"
             }
           }
         }
       },
       "select_bucket_with":{
         "bucket_selector": {
           "buckets_path": {
             "latest_entry":"latest_entry",
             "latest_approved_entry":"latest_approved_entry>approved_date"
           },
           "script": "if(params['latest_entry']==params['latest_approved_entry']) return true;"
         }
       }
     }
   }
 }
}

結果:

"aggregations" : {
    "entitites" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 1,
          "doc_count" : 2,
          "latest_entry" : {
            "value" : 9.783936E11,
            "value_as_string" : "2001-01-02"
          },
          "latest_approved_entry" : {
            "doc_count" : 1,
            "approved_date" : {
              "value" : 9.783936E11,
              "value_as_string" : "2001-01-02"
            }
          }
        }
      ]
    }
  }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM