简体   繁体   中英

How to perform complex query on aggregated fields in ElasticSearch

I am trying to figure out how to perform a complex query in elastic search, let say I have the following table of data:

在此处输入图像描述

Which I got from the following query

{
  "aggs": {
    "3": {
      "terms": {
        "field": "ColumnA",
        "order": {
          "_key": "desc"
        },
        "size": 50
      },
      "aggs": {
        "4": {
          "terms": {
            "field": "ColumnB",
            "order": {
              "_key": "desc"
            },
            "size": 50
          },
          "aggs": {
            "5": {
              "terms": {
                "field": "ColumnC",
                "order": {
                  "_key": "desc"
                },
                "size": 50
              },
              "aggs": {
                "sum_of_views": {
                  "sum": {
                    "field": "views"
                  }
                },
                "sum_of_costs": {
                  "sum": {
                    "field": "cost"
                  }
                },
                "sum_of_clicks": {
                  "sum": {
                    "field": "clicks"
                  }
                },
                "sum_of_earned": {
                  "sum": {
                    "field": "earned"
                  }
                },
                "sum_of_adv_earned": {
                  "sum": {
                    "field": "adv_earned"
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "hour",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "hour": {
              "format": "strict_date_optional_time",
              "gte": "2019-08-08T06:29:34.723Z",
              "lte": "2020-08-08T06:29:34.724Z"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Now for example, if I want to get the records that have the following condition

(sum_of_clicks / sum_of_views) * (sum_of_earned2 / sum_of_earned1) < 0.5

What should I query?

Think the below should help. My understanding is that you would want to first group based on ColumnA, ColumnB, ColumnC , calculate the sum for clicks, views, earned1 and earned2 fields and then apply the custom aggregation logic you are looking for.

I've been able to come up with the below query where I've made use of Bucket Selector Aggregation .

POST <your_index_name>/_search
{
  "size": 0, 
  "aggs": {
    "3": {
      "terms": {
        "field": "ColumnA",
        "order": {
          "_key": "desc"
        },
        "size": 50
      },
      "aggs": {
        "4": {
          "terms": {
            "field": "ColumnB",
            "order": {
              "_key": "desc"
            },
            "size": 50
          },
          "aggs": {
            "5": {
              "terms": {
                "field": "ColumnC",
                "order": {
                  "_key": "desc"
                },
                "size": 50
              },
              "aggs": {
                "sum_views": {
                  "sum": {
                    "field": "views"
                  }
                },
                "sum_clicks": {
                  "sum": {
                    "field": "clicks"
                  }
                },
                "sum_earned1": {
                  "sum": {
                    "field": "earned1"
                  }
                },
                "sum_earned2": {
                  "sum": {
                    "field": "earned2"
                  }
                },
                "custom_sum_bucket_filter": {
                  "bucket_selector": {
                    "buckets_path": {
                      "sum_of_views": "sum_views",
                      "sum_of_clicks": "sum_clicks",
                      "sum_of_earned1": "sum_earned1",
                      "sum_of_earned2": "sum_earned2"
                    },
                    "script": "(params.sum_of_views/params.sum_of_clicks) * (params.sum_of_earned1/params.sum_of_earned2) < 0.5"
                  }
                }
              }
            },
            "min_bucket_selector": {
              "bucket_selector": {
                "buckets_path": {
                  "valid_docs_count": "5._bucket_count"
                },
                "script": {
                  "source": "params.valid_docs_count >= 1"
                }
              }
            }
          }
        },
        "min_bucket_selector": {
          "bucket_selector": {
            "buckets_path": {
              "valid_docs_count": "4._bucket_count"
            },
            "script": {
              "source": "params.valid_docs_count >= 1"
            }
          }
        }
      }
    }
  }
}

Note that to get the exact result you are looking for, I've had to add the filter conditions of buckets at 4 and 5 .

The aggregations I've made use are

  • Bucket Selector to calculate the condition you've mentioned
  • Again Bucket Selector so as to not display empty buckets at aggregation 5
  • Again a bucket selector so as to now show empty buckets aggregation at level 4.

In order to test why I've added the additional empty bucket filters, you can just remove them and see what results you observe.

Note that for sake of simplicity I have ignored the query part as well as the cost field. Please feel free to add them and test it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM