简体   繁体   中英

Average of subdocument fields by document in elasticsearch

I have one elasticsearch mapping which represents students with a property representing their marks as an array of objects:

properties: {
  name: { type: "text" },
  /* ... */
  marks: {
    properties: {
      value: { type: "float" }
    }
  }
}

Based on this mapping, documents are stored in this form:

"hits" : [{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "John Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 4
      },
      {
        "_id" : "...",
        "value" : 0
      }
    ]
  }
}, 
{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "Jane Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 5
      },
      {
        "_id" : "...",
        "value" : 4
      }
    ]
  }
}, /* ... */]

Each student has a lot of marks. I would like to get, in the result of elasticsearch, the average of mark's value by student (so by document indexed in elasticsearch).

I tried an aggregation:

"aggs": {
  "avg_mark": {
    "avg": { "field": "marks.value" }
  }
}

But i get an average of all students:

aggregations: { avg_mark: { value: 3.25 } }

I then tried with sort:

"sort": [{
  "marks.value": {
    "order": "desc",
    "mode": "avg"
  }
}]

It does well an average by student, but:

  • It sorts my result and i don't always need it
  • It stores the average result in an array without key to retrieve it. This is not what i need because the sorting properties order can change depending of users searches.
"hits" : [{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "John Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 4
      },
      {
        "_id" : "...",
        "value" : 0
      }
    ]
  },
  "sort" : [ 2.0 ]
}, 
{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "Jane Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 5
      },
      {
        "_id" : "...",
        "value" : 4
      }
    ]
  },
  "sort" : [ 4.5 ]
}, /* ... */]

This sort array could be [ 4.5, value_b, value_c, ... ] or [value_b, value_c, 4.5 ] depending of the sort search request property.

I also tried to work around with nested type without success.

How can i get an average by document / student without sort my result and with a way to retrieve it easily?

Thank you in advance.

Your first try was a step in the right direction -- just gotta make sure you group by the student names before you calculate the avg mark:

GET students/_search
{ 
  "size": 0,
  "aggs": {
    "by_student": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "avg_mark": {
          "avg": {
            "field": "marks.value"
          }
        }
      }
    }
  }
}

The .keyword field suffix is coming from this slightly adjusted mapping:

PUT students
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"     <--
          }
        }
      },
      "marks": {
        "properties": {
          "value": {
            "type": "float"
          }
        }
      }
    }
  }
}

BTW -- if you want to narrow the search to only a few students, simply include a top-level query along the lines of:

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "name.keyword": [
              "John Doe",
              "Jane Doe"
            ]
          }
        }
      ]
    }
  },
  "aggs": { ... }
}

The aggregations will then take into consideration only the filtered set of documents.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM