I'm facing the following problem of selecting and sorting parent documents based on an aggregated value over its children documents. The aggregation (eg sum) itself depends on a query string, ie which children documents are relevant for the aggregation.
Example: Given the documents basket A and basket B , for each basket document
, I am looking to sum over the number
field of its fruit
children if the name
field matches my query, eg apples
.
PUT /baskets/_doc/0
{
"name": "basket A",
"fruit": [
{
"name": "apples",
"number": 2
},
{
"name": "oranges",
"number": 3
}
]
}
PUT /baskets/_doc/1
{
"name": "basket B",
"fruit": [
{
"name": "apples",
"number": 3
},
{
"name": "apples",
"number": 3
}
]
}
Mappings:
PUT /baskets
{
"mappings": {
"properties": {
"name": { "type": "text" },
"fruit": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"number": { "type": "long" }
}
}
}
}
}
How can one implement this using the Elasticsearch (7.8.0) query DSL?
I have tried so far with nested queries and aggregations without success.
Thanks!
Edit: Added mappings
Edit: Updated the numbers to better reflect the problem
*Edit: Added possible answer to Use case 2 (see comments to the answer from @joe):
GET /profiles/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
GET baskets/_search
{
"query": {
"nested": {
"path": "fruit",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"term": {
"fruit.name": {
"value": "apples"
}
}
},
{
"range": {
"fruit.number": {
"gte": 5
}
}
}
]
}
}
}
}
}
Strictly more than 5 --> gt
; >=5 --> gte
.
Also notice the inner_hits
part -- this gives you the actual nested subdocument which caused this particular basket to match the query. It's not required but good-to-know.
GET baskets/_search
{
"sort": [
{
"fruit.number": {
"nested_path": "fruit",
"order": "desc"
}
}
]
}
There are probably cleaner ways of doing this but I'd go with the following:
GET baskets/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"params": {
"only_fruit_name": "apples"
},
"init_script": "state.by_basket_name = [:]",
"map_script": """
def basket_name = params._source['name'];
def fruits = params._source['fruit'].findAll(group -> group.name == params.only_fruit_name);
for (def fruit_group : fruits) {
def number = fruit_group.number;
if (state.by_basket_name.containsKey(basket_name)) {
state.by_basket_name[basket_name] += number;
} else {
state.by_basket_name[basket_name] = number;
}
}
""",
"combine_script": "return state.by_basket_name",
"reduce_script": "return states"
}
}
}
}
yielding a hash map along the lines of
{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"basket A":2,
"basket B":6
}
]
}
}
}
Sorting can either be done in the reduce_script
or within your ES response post-processing pipeline. You could of course choose to go w/ (sorted) lists and lambdas ...
Notice the required nested_path
.
After a while of searching and testing, here are (in addition to @joe's answer to use case 2 ) possible queries for both use cases. Note that both use cases require to change the mapping for the field name
to be of type keyword
.
Use case 1 : Which basket has (strictly) more than 5 apples? Would expect only basket B
For more information on filtering results by their aggregation value see Bucket Selectors
GET /baskets/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name"
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"match": {"fruit.name": "apples"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
},
"basket_sum_filter":{
"bucket_selector":{
"buckets_path":{
"fruitSum":"nest > fruit_filter > fruit_sum"
},
"script":"params.fruitSum > 5"
}
}
}
}
}
}
... yielding
...,
"buckets": [
{
"key": "basket B",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 2,
"fruit_sum": {
"value": 6
}
}
}
}
]
Use case 2 : Sort baskets by number of apples. Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.
GET /baskets/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
... yielding
...,
"buckets": [
{
"key": "basket B",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 2,
"fruit_sum": {
"value": 6
}
}
}
},
{
"key": "basket A",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 1,
"fruit_sum": {
"value": 2
}
}
}
}
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.