简体   繁体   English

弹性搜索:仅聚合特定的嵌套文档

[英]elastic search : Aggregating the specific nested documents only

I want to aggregate the specific nested documents which satisfies the given query. 我想聚合满足给定查询的特定嵌套文档。

Let me explain it through an example. 让我通过一个例子来解释它。 I have inserted two records in my index: 我在索引中插入了两条记录:

First document is, 第一个文件是

    {
      "project": [
        {
          "subject": "maths",
          "marks": 47
        },
        {
          "subject": "computers",
          "marks": 22
        }
      ]
    }

second document is, 第二份文件是

    {
      "project": [
        {
          "subject": "maths",
          "marks": 65
        },
        {
          "subject": "networks",
          "marks": 72
        }
      ]
    }

Which contains the subject along with the marks in each record. 其中包含主题以及每个记录中的标记。 From that documents, I need to have an average of maths subject alone from the given documents. 从这些文档中,我需要从给定的文档中平均获得一个maths学科的平均水平。

The query I tried is: 我试过的查询是:

    {
      "size": 0,
      "aggs": {
        "avg_marks": {
          "avg": {
            "field": "project.marks"
          }
        }
      },
      "query": {
        "bool": {
          "must": [
            {
              "query_string": {
                "query": "project.subject:maths",
                "analyze_wildcard": true,
                "default_field": "*"
              }
            }
          ]
        }
      }
    }

Which is returning the result of aggregating all the marks average which is not required. 这将返回汇总所有不需要的平均分数的结果。

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "avg_marks": {
          "value": 51.5
        }
      }
    }

I just need an average of maths subject from the given documents, in which the expected result is 56.00 我只需要给定文档的平均数学科目,预期结果是56.00

any help with the query or idea will be helpful. 有关查询或想法的任何帮助都将有所帮助。 Thanks in advance. 提前致谢。

First you need in your mapping to specify that index have nested field like following: 首先,您需要在映射中指定索引具有嵌套字段,如下所示:

PUT /nested-index {
    "mappings": {
        "document": {
            "properties": {
                "project": {
                    "type": "nested",
                    "properties": {
                        "subject": {
                            "type": "keyword"
                        },
                        "marks": {
                            "type": "long"
                        }
                    }
                }
            }
        }
    }
}

then you insert your docs: 然后插入您的文档:

PUT nested-index/document/1
{
    "project": [
        {
            "subject": "maths",
            "marks": 47
        },
        {
            "subject": "computers",
            "marks": 22
        }
    ]
}

then insert second doc: 然后插入第二个文档:

PUT nested-index/document/2
{
    "project": [
        {
            "subject": "maths",
            "marks": 65
        },
        {
            "subject": "networks",
            "marks": 72
        }
    ]
}

and then you do aggregation but specify that you have nested structure like this: 然后进行聚合,但指定具有以下嵌套结构:

GET nested-index/_search
{
    "size": 0,
    "aggs": {
        "subjects": {
            "nested": {
                "path": "project"
            },
            "aggs": {
                "subjects": {
                    "terms": {
                        "field": "project.subject",
                        "size": 10
                    },
                    "aggs": {
                        "average": {
                            "avg": {
                                "field": "project.marks"
                            }
                        }
                    }
                }
            }
        }
    }
}

and why your query is not working and why give that result is because when you have nested field and do average it sums all number from one array if in that array you have some keyword doesn't matter that you want to aggregate only by one subject. 以及查询为什么不起作用以及给出该结果的原因是,当您嵌套字段并进行平均时,如果一个数组中的某个关键字并不重要,而您只想按一个主题进行聚合,则对一个数组中的所有数字求和。

So if you have those two docs because in both docs you have math subject avg will be calculated like this: 因此,如果您拥有这两个文档,因为在这两个文档中您都有数学主题,则avg的计算方式如下:

(47 + 22 + 65 + 72) / 4 = 51.5 (47 + 22 + 65 + 72)/ 4 = 51.5

if you want avg for networks it will return you (because in one document you have network but it will do avg over all values in array): 如果您想为网络平均使用agg,它将返回您(因为在一个文档中您具有网络,但是它将对数组中的所有值进行平均):

65 + 72 = 68.5 65 + 72 = 68.5

so you need to use nested structure in this case. 因此在这种情况下,您需要使用嵌套结构。

If you are interested just for one subject you can than do aggregation just for subject equal to something like this (subject equal to "maths"): 如果您只对一个主题感兴趣,那么可以仅对等于这样的主题(主题等于“数学”)进行聚合:

GET nested-index/_search
{
    "size": 0,
    "aggs": {
        "project": {
            "nested": {
                "path": "project"
            },
            "aggs": {
                "subjects": {
                    "filter": {
                        "term": {
                            "project.subject": "maths"
                        }
                    },
                    "aggs": {
                        "average": {
                            "avg": {
                                "field": "project.marks"
                            }
                        }
                    }
                }
            }
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM