elasticsearch 中按文檔的子文檔字段的平均值

Question

我有一個 elasticsearch 映射，它代表具有將他們的標記表示為對象數組的屬性的學生：

properties: {
  name: { type: "text" },
  /* ... */
  marks: {
    properties: {
      value: { type: "float" }
    }
  }
}

基於此映射，文檔以這種形式存儲：

"hits" : [{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "John Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 4
      },
      {
        "_id" : "...",
        "value" : 0
      }
    ]
  }
}, 
{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "Jane Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 5
      },
      {
        "_id" : "...",
        "value" : 4
      }
    ]
  }
}, /* ... */]

每個學生都有很多分數。 我想在 elasticsearch 的結果中得到學生標記值的平均值（因此通過彈性搜索中索引的文檔）。

我嘗試了一個聚合：

"aggs": {
  "avg_mark": {
    "avg": { "field": "marks.value" }
  }
}

但我得到了所有學生的平均值：

aggregations: { avg_mark: { value: 3.25 } }

然后我嘗試了排序：

"sort": [{
  "marks.value": {
    "order": "desc",
    "mode": "avg"
  }
}]

學生的平均成績很好，但是：

它對我的結果進行排序，我並不總是需要它
它將平均結果存儲在一個沒有鍵的數組中來檢索它。 這不是我需要的，因為排序屬性順序可能會根據用戶搜索而改變。

"hits" : [{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "John Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 4
      },
      {
        "_id" : "...",
        "value" : 0
      }
    ]
  },
  "sort" : [ 2.0 ]
}, 
{
  "_index" : "students",
  "_type" : "_doc",
  "_id" : "...",
  "_score" : 1.0,
  "_source" : {
    "name" : "Jane Doe",
    "marks" : [
      {
        "_id" : "...",
        "value" : 5
      },
      {
        "_id" : "...",
        "value" : 4
      }
    ]
  },
  "sort" : [ 4.5 ]
}, /* ... */]

這個排序數組可以是[ 4.5, value_b, value_c, ... ]或[value_b, value_c, 4.5 ]具體取決於排序搜索請求屬性。

我也嘗試過使用嵌套類型但沒有成功。

如何在不對結果進行排序並且輕松檢索結果的情況下獲得文檔/學生的平均值？

先感謝您。

Answer 1

您的第一次嘗試是朝着正確方向邁出的一步 - 只需確保在計算平均分數之前按學生姓名分組：

GET students/_search
{ 
  "size": 0,
  "aggs": {
    "by_student": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "avg_mark": {
          "avg": {
            "field": "marks.value"
          }
        }
      }
    }
  }
}

.keyword字段后綴來自這個稍微調整的映射：

PUT students
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"     <--
          }
        }
      },
      "marks": {
        "properties": {
          "value": {
            "type": "float"
          }
        }
      }
    }
  }
}

順便說一句——如果您想將搜索范圍縮小到只有少數學生，只需包含一個頂級查詢，如下所示：

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "name.keyword": [
              "John Doe",
              "Jane Doe"
            ]
          }
        }
      ]
    }
  },
  "aggs": { ... }
}

然后，聚合將僅考慮過濾的文檔集。

elasticsearch 中按文檔的子文檔字段的平均值

問題描述

1 個解決方案

解決方案1
1 已采納 2021-02-03 00:03:27

elasticsearch 中按文檔的子文檔字段的平均值

問題描述

1 個解決方案

解決方案1 1 已采納 2021-02-03 00:03:27

解決方案1
1 已采納 2021-02-03 00:03:27