简体   繁体   中英

Window Function with Rolling Sum by Date

I'm trying to write a query that returns, for each of the last 44 days, a count of the rentals made in the 7-day window preceding that day.

This is tricky because not all dates in the set are consecutive, and dates without rentals are not rows in the data set.

Here is where I am downloading the data from: https://www.postgresqltutorial.com/postgresql-sample-database/

I know this requires the use of a WINDOW function and most likely, the ORDER BY clause, but my results are returning what just look like a running sum, rather than a rolling sum for every 7 days preceding each date. Here is my code:

WITH t AS (
    SELECT date_trunc('day', rental_date) rental_date, count(rental_id) cnt
    FROM rental
    WHERE rental_date >= CURRENT_DATE - INTERVAL '44 DAYS'
    GROUP BY 1
)
SELECT rental_date, SUM(cnt) OVER w
FROM t
WINDOW w AS (ORDER BY rental_date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW)
ORDER BY rental_date DESC;

The expected output would look something like:

       Col1                            Col2                               
date_trunc1                count(rental_id) 
2006-02-21 00:00:00                     182
2006-02-20 00:00:00                     182
2006-02-19 00:00:00                     182
2006-02-18 00:00:00                     182
2006-02-17 00:00:00                     182
2006-02-16 00:00:00                     182                           
2006-02-15 00:00:00                     182
2005-08-30 00:00:00                     598
2005-08-29 00:00:00                    1224
2005-08-28 00:00:00                    1883
2005-08-27 00:00:00                    2507  
2005-08-26 00:00:00                    3135
2005-08-25 00:00:00                    3756    
2005-08-24 00:00:00                    4349
2005-08-23 00:00:00                    3374
2005-08-22 00:00:00                    3148
2005-08-21 00:00:00                    2489
2005-08-20 00:00:00                    1865
2005-08-19 00:00:00                    1237
2005-08-18 00:00:00                     616
2005-08-17 00:00:00                      23
2005-08-16 00:00:00                       0
2005-08-08 00:00:00                     671
2005-08-07 00:00:00                    1305

*It's just weird bc dates like '2005-08-08' and '2005-08-07' don't exist in the data set because no rentals took place on those days, but they would need to show up in the output because rentals did occur on '2005-08-01' and '2005-07-30' within the 7 days preceding.

I think you want:

SELECT r.*
FROM (SELECT date_trunc('day', rental_date) as rental_date, COUNT(*) as day_count,
             SUM(COUNT(*)) OVER (ORDER BY MIN(rental_date) RANGE BETWEEN INTERVAL '7 DAY' PRECEDING AND CURRENT ROW)
      FROM rental
      GROUP BY date_trunc('day', rental_date)
     ) r
WHERE rental_date >= CURRENT_DATE - INTERVAL '44 DAY'
ORDER BY rental_date DESC;

That is:

  • The window frame should be RANGE , not ROWS .
  • The filtering for the overall timeframe should be after the window function.
  • A 7-day total is either 7 days preceding to one day preceding or 6 days preceding to current -- depending on whether the current row is included.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM