[英]Sum of last_value of each partition in SQL with window functions
I have a table that stores total disk used at any point in time for each entity.我有一个表,用于存储每个实体在任何时间点使用的总磁盘。 I want to find the peak disk used in a time period.我想找到某个时间段内使用的峰值磁盘。 For example, the data looks something like例如,数据看起来像
Note: The timestamp is actual timestamp with seconds precision, I set it to 10am etc for brevity注意:时间戳是具有秒精度的实际时间戳,为简洁起见,我将其设置为 10am 等
timestamp | entity_id | disk_used
---------------------------------
9am | 1 | 10
10am | 2 | 20
11am | 2 | 15
12am | 1 | 12
In this example, the max disk used at is 30 (10 from entity 1 and 20 from entity 2).在此示例中,使用的最大磁盘为 30(实体 1 为 10,实体 2 为 20)。
I have tried a number of approaches.我尝试了多种方法。
select timestamp, entity_id,
disk_used,
sum(last_value(disk_used) over(
partition by entity_id order by timestamp)
) sum_of_last
attempting to generate, so I can then max of it,试图生成,所以我可以最大,
timestamp | entity_id | disk_used | sum_of_last
-----------------------------------------------
9am | 1 | 10 | 10
10am | 2 | 20 | 30
11am | 2 | 15 | 25 // (10 + 15)
12am | 1 | 12 | 27 // (12 + 15)
however, that query doesn't work because we cannot aggregate over a window function in ISO Standard SQL 2003. I am using Amazon timestream db.但是,该查询不起作用,因为我们无法通过 ISO 标准 SQL 2003 中的窗口函数进行聚合。我使用的是 Amazon timestream db。 The query engine is compatible with ISO Standard SQL 2003.查询引擎与 ISO 标准 SQL 2003 兼容。
-- Rephrasing the same question, at each timestamp we have the data point, for the total disk used at that instant. -- 重新表述相同的问题,在每个时间戳,我们都有数据点,用于该时刻使用的总磁盘。 To find the total total disk used at that instant, sum the last value of each entity.要找到当时使用的总磁盘总数,请对每个实体的最后一个值求和。
Is there an effective way to compute this?有没有一种有效的方法来计算这个?
If you have only two entities, you can do:如果您只有两个实体,您可以执行以下操作:
select t.*,
(last_value(case when entity_id = 1 then disk_used end ignore nulls) over (order by time) +
last_value(case when entity_id = 2 then disk_used end ignore nulls) over (order by time)
) as total
from t;
One way to generalize this for all entities is to generate a row for each entity at each time, impute the value and aggregate:对所有实体进行概括的一种方法是每次为每个实体生成一行,估算值并聚合:
select ti.time, e.entity_id,
last_value(disk_used ignore nulls) over (partition by e.entity_id order by t.time) as imputed_disk_used
from (select distinct time from t) ti cross join
(select distinct entity_id from t) e left join
t
on ti.time = t.time and e.entity_id = t.entity_id;
Then you can aggregate:然后你可以聚合:
select time, sum(imputed_disk_used)
from (select ti.time, e.entity_id,
last_value(disk_used ignore nulls) over (partition by e.entity_id order by t.time) as imputed_disk_used
from (select distinct time from t) ti cross join
(select distinct entity_id from t) e left join
t
on ti.time = t.time and e.entity_id = t.entity_id
) te
group by time;
However, this gives that value per time rather than per time and entity_id
.但是,这给出了每次而不是每次和entity_id
。
I want to find the peak disk used in a time period我想找到某个时间段内使用的峰值磁盘
You can use two levels of aggregation:您可以使用两个级别的聚合:
select max(sum_disk_used)
from (
select time, sum(disk_used) as sum_disk_used
from mytable
group by time
) t
The subquery computest the total disk_used
at each point in time, then the outer query gets the peak value only.子查询计算每个时间点的总disk_used
使用量,然后外部查询仅获取峰值。
If your database supports some kind of limit
clause, this can be simplified:如果您的数据库支持某种limit
子句,则可以简化:
select time, sum(disk_used) as sum_disk_used
from mytable
group by time
order by sum_disk_used limit 1
To filter on a given period, you would typically add a where
clause to the subquery.要过滤给定的时间段,您通常会向子查询添加where
子句。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.