简体   繁体   English

如何通过 Elasticsearch 6.x 中的动态或未知字段进行聚合

[英]How to aggregate by dynamic or unknown fields in Elasticsearch 6.x

I'm fairly new to ElasticSearch, currently using v6.2 and I seem to have run into a problem while trying to add some aggregations to a query.我对 ElasticSearch 相当陌生,目前使用 v6.2,在尝试向查询添加一些聚合时似乎遇到了问题。 Trying to wrap my head around the various types of aggregation, as well as the best ways to store the data.试图围绕各种类型的聚合以及存储数据的最佳方式。

When the query runs, I have some variable attributes that I would like to aggregate and then return as filters to the user.当查询运行时,我有一些变量属性我想聚合然后作为过滤器返回给用户。 For example, one character may have attributes for "size", "shape" and "colour", while another only has "shape" and "colour".例如,一个字符可能具有“大小”、“形状”和“颜色”的属性,而另一个字符只有“形状”和“颜色”。

The full list of attributes is unknown so I don't think I would be able to construct the query that way.完整的属性列表是未知的,所以我认为我无法以这种方式构建查询。

My data is currently structured like this:我的数据目前的结构如下:

{
    id : 1,
    title : 'New Character 1',
    group : 1,
    region : 1,
    attrs : [
        moves : 2,

        # These would be dynamic, would only apply to some rows, not others.
        var_colours : ['Blue', Green', 'Red'],
        var_shapes : ['Round', 'Square', 'Etc'],

        effects : [
            { id : 1, value: 20},
            { id : 2, value: 60},
            { id : 3, value: 10},
        ]

    ]
}

I currently have an aggregation of groups and regions that looks like this.我目前有一个看起来像这样的组和区域的聚合。 It seems to be working wonderfully and I would like to add something similar for the attributes.它似乎工作得很好,我想为属性添加类似的东西。

[
    'aggs' => [
        'group_ids' => [
            'terms' => [
                'field' => 'group',
                'order' => [ '_count' => 'desc' ]
            ]
        ],
        'region_ids' => [
            'terms' => [
                'field' => 'region',
                'order' => [ '_count' => 'desc' ]
            ]
        ]
    ]
]

I'm hoping to get a result that looks like the below.我希望得到如下所示的结果。 I am also not sure if the data structure is setup in the best way either, I can make changes there if necessary.我也不确定数据结构是否以最佳方式设置,如有必要,我可以在那里进行更改。

[aggregations] => [
    [groups] => [
        [doc_count_error_upper_bound] => 0
        [sum_other_doc_count] => 0
        [buckets] => [
            [0] => [
                [key] => 5
                [doc_count] => 27
            ],
            [1] => [
                [key] => 2
                [doc_count] => 7
            ]
        ]
    ],

    [var_colours] => [
        [doc_count_error_upper_bound] => 0
        [sum_other_doc_count] => 0
        [buckets] => [
            [0] => [
                [key] => 'Red'
                [doc_count] => 27
            ],
            [1] => [
                [key] => 'Blue'
                [doc_count] => 7
            ]
        ]
    ],

    [var_shapes] => [
        [doc_count_error_upper_bound] => 0
        [sum_other_doc_count] => 0
        [buckets] => [
            [0] => [
                [key] => 'Round'
                [doc_count] => 27
            ],
            [1] => [
                [key] => 'Polygon'
                [doc_count] => 7
            ]
        ]
    ]

    // ...
]

Any insight that anyone could provide would be extremely appreciated.任何人都可以提供的任何见解将不胜感激。

You should do this within your PHP script.您应该在 PHP 脚本中执行此操作。

I can think of the following:我可以想到以下几点:

  1. Use the Dynamic field mapping for your index.为您的索引使用动态字段映射

By default, when a previously unseen field is found in a document, Elasticsearch will add the new field to the type mapping.默认情况下,当在文档中找到以前未见过的字段时,Elasticsearch 会将新字段添加到类型映射中。 This behaviour can be disabled, both at the document and at the object level, by setting the dynamic parameter to false (to ignore new fields) or to strict (to throw an exception if an unknown field is encountered).可以在文档和 object 级别禁用此行为,方法是将动态参数设置为 false(忽略新字段)或严格(如果遇到未知字段则抛出异常)。

  1. Get all the existing fields in your index.获取索引中的所有现有字段。 Use the Get mapping API for this.为此,请使用获取映射 API

  2. Loop over the results of Step 2 so you can get all the existing fields in your index.循环第 2 步的结果,以便您可以获取索引中的所有现有字段。 You can store them in a list (or array), for example.例如,您可以将它们存储在列表(或数组)中。

  3. You can create a PHP Elasticsearch terms aggregation for each of the fields in your list (or array).您可以为列表(或数组)中的每个字段创建 PHP Elasticsearch 术语聚合 This is: create an empty or base query with no terms aggregation and add one terms for each element you got from step 3.这是:创建一个没有术语聚合的空查询或基本查询,并为从第 3 步获得的每个元素添加一个术语。

  4. Add to each terms, the missing field with an empty empty string ("").将带有空字符串 ("") 的缺失字段添加到每个术语。

  5. That's it.而已。 Following this, you have creating a query in such way that, no matter what index you're searching, you'll get a terms agg with all the existing fields for it.在此之后,您以这样的方式创建查询,无论您正在搜索什么索引,您都将获得一个包含所有现有字段的术语 agg。

Advantages :优点

  • Your terms aggregations will be generated dynamically with all the existing fields.您的术语聚合将使用所有现有字段动态生成。

  • For each of the doc that does not contain any of the fields, an empty string will be shown.对于每个不包含任何字段的文档,将显示一个空字符串。

Disadvantages :缺点

  • Looping through the GET mapping API's result could be a little frustrating (but I trust you).循环通过 GET 映射 API 的结果可能有点令人沮丧(但我相信你)。

  • Performance (time & resources) will be affected for every new field you find in your mappings.您在映射中找到的每个新字段都会影响性能(时间和资源)。

I hope this is helpful: :D我希望这会有所帮助::D

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM