简体   繁体   English

Elasticsearch:获取汇总数据最后条目的摘要

[英]Elasticsearch: getting summary on last entries for aggregated data

I have an Elasticsearch index with documents like this:我有一个 Elasticsearch 索引,其中包含如下文档:

entity_id entity_id operation手术 timestamp时间戳
a1 a1 X X 2021-01-01 2021-01-01
a1 a1 Y 2021-01-02 2021-01-02
a1 a1 Z Z 2021-01-10 2021-01-10
b1 b1 Z Z 2021-01-03 2021-01-03
b1 b1 Z Z 2021-01-05 2021-01-05
b1 b1 Y 2021-01-20 2021-01-20
c1 c1 Z Z 2021-01-03 2021-01-03
c1 c1 X X 2021-01-05 2021-01-05
c1 c1 Y 2021-01-20 2021-01-20

There are some entities (entity_id), each of them can be updated multiple times in various ways (operation) in different time (timestamp).有一些实体(entity_id),每个实体都可以在不同的时间(时间戳)以各种方式(操作)多次更新。

I need cumulative information about last operation executed to each entity.我需要有关对每个实体执行的最后一次操作的累积信息。 For example, for these data, I need information in form: X=0, Y=2, Z=1例如,对于这些数据,我需要以下形式的信息:X=0, Y=2, Z=1

Y=2 because "Y" is the last operation happened to "b1" and "c1" entities Y=2 因为“Y”是最后一次发生在“b1”和“c1”实体上的操作

Z=1 because "Z" is the last operation happened to "a1" entity Z=1 因为“Z”是对“a1”实体发生的最后一次操作

I made the query to get info on last operation for each entity, like below:我进行了查询以获取每个实体的最后一次操作的信息,如下所示:

{
    "size": 0,
    "aggs": {
        "group_by_id": {
            "terms": {
                "field": "entity_id"
            },
            "aggs": {
                "last_entry": {
                    "top_hits": {
                        "size": 1,
                        "_source": {
                            "include": [
                                "operation",
                                "timestamp"
                            ]
                        },
                        "sort": [
                            {
                                "timestamp": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

It works but due to huge amount of data I won't be able to iterate on the aggregation results and sum operations by type afterwards, in code.它可以工作,但由于大量数据,我将无法在代码中迭代聚合结果并按类型求和操作。 I need to count of last operations in Elasticsearch query, if this feasible.如果可行,我需要计算 Elasticsearch 查询中的最后一次操作。

Can anybody suggest how this can be achieved?有人可以建议如何实现吗?

Thanks!谢谢!

I have been searching for a while for approach to solve tasks like this.我一直在寻找解决此类任务的方法。 I have found lots of similar questions but no proper suggestions so far.我发现了很多类似的问题,但到目前为止还没有合适的建议。 Here is the solution I has finally come to, maybe it will be useful for someone with similar tasks.这是我终于找到的解决方案,也许它对有类似任务的人有用。 The idea is to use scripted_metric aggregation and calculate needed summary data by means of scripts这个想法是使用scripted_metric聚合并通过脚本计算所需的汇总数据

{
"size": 0,
"aggs":{
    "total": {
        "scripted_metric": {
            "init_script": "state.operations=new Hashtable();", 
            "map_script": <Add to state.operations every doc using entity_id as key. When another doc for the same entity_id is found check its timestamp and replace the existing doc if the new found doc is newer>,
            "combine_script": "return state.operations",
            "reduce_script": <Here you have "states" variable which contains hashtables returned by the combine script per each shard. You can iterate states, merge all hashtables together and return the resulting hashtable or just calculate needed summary values>
        }
    }
},
"sort": [
    {
        "timestamp": {
            "order": "desc"
        }
    }
]

} }

This is just an algorithm, I wrote plain description in map_script and combine_script as my real case was much more complex than the simplified example I posted here.这只是一个算法,我在 map_script 和 combine_script 中写了简单的描述,因为我的真实案例比我在这里发布的简化示例复杂得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM