简体   繁体   English

如何对 ElasticSearch 中的聚合字段执行复杂查询

[英]How to perform complex query on aggregated fields in ElasticSearch

I am trying to figure out how to perform a complex query in elastic search, let say I have the following table of data:我试图弄清楚如何在弹性搜索中执行复杂的查询,假设我有以下数据表:

在此处输入图像描述

Which I got from the following query我从以下查询中得到的

{
  "aggs": {
    "3": {
      "terms": {
        "field": "ColumnA",
        "order": {
          "_key": "desc"
        },
        "size": 50
      },
      "aggs": {
        "4": {
          "terms": {
            "field": "ColumnB",
            "order": {
              "_key": "desc"
            },
            "size": 50
          },
          "aggs": {
            "5": {
              "terms": {
                "field": "ColumnC",
                "order": {
                  "_key": "desc"
                },
                "size": 50
              },
              "aggs": {
                "sum_of_views": {
                  "sum": {
                    "field": "views"
                  }
                },
                "sum_of_costs": {
                  "sum": {
                    "field": "cost"
                  }
                },
                "sum_of_clicks": {
                  "sum": {
                    "field": "clicks"
                  }
                },
                "sum_of_earned": {
                  "sum": {
                    "field": "earned"
                  }
                },
                "sum_of_adv_earned": {
                  "sum": {
                    "field": "adv_earned"
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "hour",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "hour": {
              "format": "strict_date_optional_time",
              "gte": "2019-08-08T06:29:34.723Z",
              "lte": "2020-08-08T06:29:34.724Z"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Now for example, if I want to get the records that have the following condition现在例如,如果我想获取具有以下条件的记录

(sum_of_clicks / sum_of_views) * (sum_of_earned2 / sum_of_earned1) < 0.5

What should I query?我应该查询什么?

Think the below should help.认为以下内容应该有所帮助。 My understanding is that you would want to first group based on ColumnA, ColumnB, ColumnC , calculate the sum for clicks, views, earned1 and earned2 fields and then apply the custom aggregation logic you are looking for.我的理解是,您希望首先基于ColumnA, ColumnB, ColumnC ,计算clicks, views, earned1 and earned2字段的总和,然后应用您正在寻找的自定义聚合逻辑。

I've been able to come up with the below query where I've made use of Bucket Selector Aggregation .我已经能够提出以下查询,其中我使用了Bucket Selector Aggregation

POST <your_index_name>/_search
{
  "size": 0, 
  "aggs": {
    "3": {
      "terms": {
        "field": "ColumnA",
        "order": {
          "_key": "desc"
        },
        "size": 50
      },
      "aggs": {
        "4": {
          "terms": {
            "field": "ColumnB",
            "order": {
              "_key": "desc"
            },
            "size": 50
          },
          "aggs": {
            "5": {
              "terms": {
                "field": "ColumnC",
                "order": {
                  "_key": "desc"
                },
                "size": 50
              },
              "aggs": {
                "sum_views": {
                  "sum": {
                    "field": "views"
                  }
                },
                "sum_clicks": {
                  "sum": {
                    "field": "clicks"
                  }
                },
                "sum_earned1": {
                  "sum": {
                    "field": "earned1"
                  }
                },
                "sum_earned2": {
                  "sum": {
                    "field": "earned2"
                  }
                },
                "custom_sum_bucket_filter": {
                  "bucket_selector": {
                    "buckets_path": {
                      "sum_of_views": "sum_views",
                      "sum_of_clicks": "sum_clicks",
                      "sum_of_earned1": "sum_earned1",
                      "sum_of_earned2": "sum_earned2"
                    },
                    "script": "(params.sum_of_views/params.sum_of_clicks) * (params.sum_of_earned1/params.sum_of_earned2) < 0.5"
                  }
                }
              }
            },
            "min_bucket_selector": {
              "bucket_selector": {
                "buckets_path": {
                  "valid_docs_count": "5._bucket_count"
                },
                "script": {
                  "source": "params.valid_docs_count >= 1"
                }
              }
            }
          }
        },
        "min_bucket_selector": {
          "bucket_selector": {
            "buckets_path": {
              "valid_docs_count": "4._bucket_count"
            },
            "script": {
              "source": "params.valid_docs_count >= 1"
            }
          }
        }
      }
    }
  }
}

Note that to get the exact result you are looking for, I've had to add the filter conditions of buckets at 4 and 5 .请注意,要获得您正在寻找的确切结果,我必须在45处添加存储桶的过滤条件。

The aggregations I've made use are我使用的聚合是

  • Bucket Selector to calculate the condition you've mentioned Bucket Selector 来计算你提到的条件
  • Again Bucket Selector so as to not display empty buckets at aggregation 5再次桶选择器,以便在聚合 5 时不显示空桶
  • Again a bucket selector so as to now show empty buckets aggregation at level 4.再次使用存储桶选择器,以便现在在级别 4 显示空存储桶聚合。

In order to test why I've added the additional empty bucket filters, you can just remove them and see what results you observe.为了测试为什么我添加了额外的空桶过滤器,您可以删除它们并查看您观察到的结果。

Note that for sake of simplicity I have ignored the query part as well as the cost field.请注意,为简单起见,我忽略了query部分以及cost字段。 Please feel free to add them and test it.请随时添加它们并进行测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM