简体   繁体   English

按日期滚动求和的窗函数

[英]Window Function with Rolling Sum by Date

I'm trying to write a query that returns, for each of the last 44 days, a count of the rentals made in the 7-day window preceding that day.我正在尝试编写一个查询,该查询返回过去 44 天的每一天,该天之前的 7 天窗口中的租金计数。

This is tricky because not all dates in the set are consecutive, and dates without rentals are not rows in the data set.这很棘手,因为并非集合中的所有日期都是连续的,并且没有租金的日期不是数据集中的行。

Here is where I am downloading the data from: https://www.postgresqltutorial.com/postgresql-sample-database/这是我从以下位置下载数据的地方: https : //www.postgresqltutorial.com/postgresql-sample-database/

I know this requires the use of a WINDOW function and most likely, the ORDER BY clause, but my results are returning what just look like a running sum, rather than a rolling sum for every 7 days preceding each date.我知道这需要使用 WINDOW 函数,最有可能是 ORDER BY 子句,但我的结果返回的只是一个运行总和,而不是每个日期前每 7 天的滚动总和。 Here is my code:这是我的代码:

WITH t AS (
    SELECT date_trunc('day', rental_date) rental_date, count(rental_id) cnt
    FROM rental
    WHERE rental_date >= CURRENT_DATE - INTERVAL '44 DAYS'
    GROUP BY 1
)
SELECT rental_date, SUM(cnt) OVER w
FROM t
WINDOW w AS (ORDER BY rental_date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW)
ORDER BY rental_date DESC;

The expected output would look something like:预期的输出类似于:

       Col1                            Col2                               
date_trunc1                count(rental_id) 
2006-02-21 00:00:00                     182
2006-02-20 00:00:00                     182
2006-02-19 00:00:00                     182
2006-02-18 00:00:00                     182
2006-02-17 00:00:00                     182
2006-02-16 00:00:00                     182                           
2006-02-15 00:00:00                     182
2005-08-30 00:00:00                     598
2005-08-29 00:00:00                    1224
2005-08-28 00:00:00                    1883
2005-08-27 00:00:00                    2507  
2005-08-26 00:00:00                    3135
2005-08-25 00:00:00                    3756    
2005-08-24 00:00:00                    4349
2005-08-23 00:00:00                    3374
2005-08-22 00:00:00                    3148
2005-08-21 00:00:00                    2489
2005-08-20 00:00:00                    1865
2005-08-19 00:00:00                    1237
2005-08-18 00:00:00                     616
2005-08-17 00:00:00                      23
2005-08-16 00:00:00                       0
2005-08-08 00:00:00                     671
2005-08-07 00:00:00                    1305

*It's just weird bc dates like '2005-08-08' and '2005-08-07' don't exist in the data set because no rentals took place on those days, but they would need to show up in the output because rentals did occur on '2005-08-01' and '2005-07-30' within the 7 days preceding. *这只是奇怪的 bc 日期,如 '2005-08-08' 和 '2005-08-07' 不存在于数据集中,因为那些日子没有发生租金,但它们需要出现在输出中,因为租金确实发生在 '2005-08-01' 和 '2005-07-30' 前 7 天内。

I think you want:我想你想要:

SELECT r.*
FROM (SELECT date_trunc('day', rental_date) as rental_date, COUNT(*) as day_count,
             SUM(COUNT(*)) OVER (ORDER BY MIN(rental_date) RANGE BETWEEN INTERVAL '7 DAY' PRECEDING AND CURRENT ROW)
      FROM rental
      GROUP BY date_trunc('day', rental_date)
     ) r
WHERE rental_date >= CURRENT_DATE - INTERVAL '44 DAY'
ORDER BY rental_date DESC;

That is:那是:

  • The window frame should be RANGE , not ROWS .窗框应该是RANGE ,而不是ROWS
  • The filtering for the overall timeframe should be after the window function.整个时间范围的过滤应该窗口函数之后。
  • A 7-day total is either 7 days preceding to one day preceding or 6 days preceding to current -- depending on whether the current row is included. 7 天总数是前一天前 7 天或前一天前 6 天 - 取决于是否包括当前行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM