[英]SQL Window Function over sliding time window
I have the following data:我有以下数据:
country objectid objectuse
record_date
2022-07-20 chile 0 4
2022-07-01 chile 1 4
2022-07-02 chile 1 4
2022-07-03 chile 1 4
2022-07-04 chile 1 4
... ... ... ...
2022-07-26 peru 3088 4
2022-07-27 peru 3088 4
2022-07-28 peru 3088 4
2022-07-30 peru 3088 4
2022-07-31 peru 3088 4
The data describes the daily usage of an object within a country for a single month (July 2022), and not all object are used every day.该数据描述了object在一个国家/地区单月(2022年7月)的日常使用情况,并非所有object每天都在使用。 One of the things I am interested in finding is the sum of the monthly maximums for the month:我有兴趣找到的一件事是该月的每月最大值的总和:
WITH month_max AS (
SELECT
country,
objectid,
MAX(objectuse) AS maxuse
FROM mytable
GROUP BY
country,
objectid
)
SELECT
country,
SUM(maxuse)
FROM month_max
GROUP BY country;
Which results in this:结果是:
country sum
-------------
chile 1224
peru 17008
But what I actually want is to get the rolling sum of the maxima from the beginning of the month up to each date.但我真正想要的是从月初到每个日期的最大值的滚动总和。 So that I get something that looks like:所以我得到的东西看起来像:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008
I tried using a window function like this to no avail:我尝试使用这样的 window function 无济于事:
SELECT
*,
SUM(objectuse) OVER (
PARTITION BY country
ORDER BY record_date ROWS 30 PRECEDING
) as cumesum
FROM mytable
order BY cumesum DESC;
Is there a way I can achieve the desired result in SQL?有没有办法可以在 SQL 中达到预期的结果?
Thanks in advance.提前致谢。
EDIT: For what it's worth, I asked the same question but on Pandas and I received an answer;编辑:对于它的价值,我问了同样的问题,但在 Pandas 上,我收到了答案; perhaps it helps to figure out how to do it in SQL.也许它有助于弄清楚如何在 SQL 中做到这一点。
We can use SUM()
as a window function, with a partition by year and month.我们可以将SUM()
用作 window function,并按年和月进行分区。
SELECT record_date, country, objectid,
SUM(objectuse) OVER (PARTITION BY TO_CHAR(record_date, 'YYYY-MM'), country
ORDER BY record_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sum
FROM mytable
ORDER BY record_date;
WITH month_max AS (
SELECT country, objectid,
MAX(objectuse) over (PARTITION BY objectid ORDER BY record_date) AS maxuse
FROM mytable
)
SELECT
country,
SUM(maxuse)
FROM month_max
GROUP BY country;
This does assume one row per object per date.这确实假设每个日期每个 object 一行。
What ended up working is probably not the most efficient approach to this problem.最终起作用的可能不是解决此问题的最有效方法。 I essentially created backwards looking blocks from each day in the month back towards the beginning of the month.我基本上从每月的每一天到月初创建了向后看的块。 Within each of these buckets I get the maximum of objectuse
for each objectid
within that bucket.在这些桶中的每一个中,我得到了该桶中每个objectid
的最大objectuse
。 After taking the max, I sum across all the maxima for that backward looking period.取最大值后,我对那个回溯期的所有最大值求和。 I do this for every day in the data.我每天都在数据中这样做。
Here is the query that does it:这是执行此操作的查询:
WITH daily_lookback AS (
SELECT
A.record_date,
A.country,
B.objectid,
MAX(B.objectuse) AS maxuse
FROM mytable AS A
LEFT JOIN mytable AS B
ON A.record_date >= B.record_date
AND A.country = B.country
AND DATE_PART('month', A.record_date) = DATE_PART('month', B.record_date)
AND DATE_PART('year', A.record_date) = DATE_PART('year', B.record_date)
GROUP BY
A.record_date,
A.country,
B.objectid
)
SELECT
record_date,
country,
SUM(maxuse) AS usetotal
FROM daily_lookback
GROUP BY
record_date,
country
ORDER BY
record_date;
Which gives me exactly what I was looking for: the cumulative sum of the objectid
maximums for the backward looking period, like this:这正是我正在寻找的东西:向后看期间的objectid
最大值的累积总和,如下所示:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.