简体   繁体   中英

Rolling sum based on date (when dates are missing)

You may be aware of rolling the results of an aggregate over a specific number of preceding rows. Ie: how many hot dogs did I eat over the last 7 days

SELECT HotDogCount, 
       DateKey, 
       SUM(HotDogCount) OVER (ORDER BY DateKey ROWS 6 PRECEDING) AS HotDogsLast7Days
FROM dbo.HotDogConsumption

Results:

+-------------+------------+------------------+
| HotDogCount |  DateKey   | HotDogsLast7Days |
+-------------+------------+------------------+
|           3 | 09/21/2020 |                3 |
|           2 | 9/22/2020  |                5 |
|           1 | 09/23/2020 |                6 |
|           1 | 09/24/2020 |                7 |
|           1 | 09/25/2020 |                8 |
|           4 | 09/26/2020 |               12 |
|           1 | 09/27/2020 |               13 |
|           3 | 09/28/2020 |               13 |
|           2 | 09/29/2020 |               13 |
|           1 | 09/30/2020 |               13 |
+-------------+------------+------------------+

Now, the problem I am having is when there are gaps in the dates. So, basically, one day my intestines and circulatory system are screaming at me: "What the heck are you doing, you're going to kill us all!!!" So, I decide to give my body a break for a day and now there is no record for that day. When I use the "ROWS 6 PRECEDING" method, I will now be reaching back 8 days, rather than 7, because one day was missed.

So, the question is, do any of you know how I could use the OVER clause to truly use a date value (something like "DATEADD(day,-7,DateKey)") to determine how many previous rows should be summed up for a true 7 day rolling sum, regardless of whether I only ate hot dogs on one day or on all 7 days?

Side note, to have a record of 0 for the days I didn't eat any hotdogs is not an option. I understand that I could use an array of dates and left join to it and do a

CASE WHEN Datekey IS NULL THEN 0 END

type of deal, but I would like to find out if there is a different way where the rows preceding value can somehow be determined dynamically based on the date.

Window functions are the right approach in theory. But to look back at the 7 preceding days (not rows), we need a range frame specification - which, unfornately, SQL Server does not support.

I am going to recommend a subquery, or a lateral join:

select hdc.*, hdc1.*
from dbo.HotDogConsumption hdc
cross apply (
    select coalesce(sum(HotDogCount), 0) HotDogsLast7Days 
    from dbo.HotDogConsumption hdc1
    where hdc1.datekey >= dateadd(day, -7, hdc.datekey)
      and hdc1.datekey < hdc.datekey
) hdc1

You might want to adjust the conditions in the where clause of the subquery to the precise frame that you want. The above code computes over the last 7 days, not including today. Something equivalent to your current attempt would be like:

    where hdc1.datekey >= dateadd(day, -6, hdc.datekey)
      and hdc1.datekey <= hdc.datekey

I'm kind of old school, but this is how I'd go about it:

SELECT
        HDC1.HotDogCount
       ,HDC1.DateKey
       ,( SELECT SUM( HDC2.HotDogCount )
            FROM HotDogConsumption HDC2
           WHERE HDC2.DateKey BETWEEN DATEADD( DD, -7, HDC1.DateKey )
                                  AND HDC1.DateKey ) AS 'HotDogsLast7Days'
  FROM
        HotDogConsumption HDC1
;

Someone younger might use an OUTER APPLY or something.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM