简体   繁体   中英

elastic search : Aggregating the specific nested documents only

I want to aggregate the specific nested documents which satisfies the given query.

Let me explain it through an example. I have inserted two records in my index:

First document is,

    {
      "project": [
        {
          "subject": "maths",
          "marks": 47
        },
        {
          "subject": "computers",
          "marks": 22
        }
      ]
    }

second document is,

    {
      "project": [
        {
          "subject": "maths",
          "marks": 65
        },
        {
          "subject": "networks",
          "marks": 72
        }
      ]
    }

Which contains the subject along with the marks in each record. From that documents, I need to have an average of maths subject alone from the given documents.

The query I tried is:

    {
      "size": 0,
      "aggs": {
        "avg_marks": {
          "avg": {
            "field": "project.marks"
          }
        }
      },
      "query": {
        "bool": {
          "must": [
            {
              "query_string": {
                "query": "project.subject:maths",
                "analyze_wildcard": true,
                "default_field": "*"
              }
            }
          ]
        }
      }
    }

Which is returning the result of aggregating all the marks average which is not required.

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "avg_marks": {
          "value": 51.5
        }
      }
    }

I just need an average of maths subject from the given documents, in which the expected result is 56.00

any help with the query or idea will be helpful. Thanks in advance.

First you need in your mapping to specify that index have nested field like following:

PUT /nested-index {
    "mappings": {
        "document": {
            "properties": {
                "project": {
                    "type": "nested",
                    "properties": {
                        "subject": {
                            "type": "keyword"
                        },
                        "marks": {
                            "type": "long"
                        }
                    }
                }
            }
        }
    }
}

then you insert your docs:

PUT nested-index/document/1
{
    "project": [
        {
            "subject": "maths",
            "marks": 47
        },
        {
            "subject": "computers",
            "marks": 22
        }
    ]
}

then insert second doc:

PUT nested-index/document/2
{
    "project": [
        {
            "subject": "maths",
            "marks": 65
        },
        {
            "subject": "networks",
            "marks": 72
        }
    ]
}

and then you do aggregation but specify that you have nested structure like this:

GET nested-index/_search
{
    "size": 0,
    "aggs": {
        "subjects": {
            "nested": {
                "path": "project"
            },
            "aggs": {
                "subjects": {
                    "terms": {
                        "field": "project.subject",
                        "size": 10
                    },
                    "aggs": {
                        "average": {
                            "avg": {
                                "field": "project.marks"
                            }
                        }
                    }
                }
            }
        }
    }
}

and why your query is not working and why give that result is because when you have nested field and do average it sums all number from one array if in that array you have some keyword doesn't matter that you want to aggregate only by one subject.

So if you have those two docs because in both docs you have math subject avg will be calculated like this:

(47 + 22 + 65 + 72) / 4 = 51.5

if you want avg for networks it will return you (because in one document you have network but it will do avg over all values in array):

65 + 72 = 68.5

so you need to use nested structure in this case.

If you are interested just for one subject you can than do aggregation just for subject equal to something like this (subject equal to "maths"):

GET nested-index/_search
{
    "size": 0,
    "aggs": {
        "project": {
            "nested": {
                "path": "project"
            },
            "aggs": {
                "subjects": {
                    "filter": {
                        "term": {
                            "project.subject": "maths"
                        }
                    },
                    "aggs": {
                        "average": {
                            "avg": {
                                "field": "project.marks"
                            }
                        }
                    }
                }
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM