简体   繁体   中英

rolling sum for each product each date

If I have a table

Product date sold
A 2022-01-01 2
A 2022-01-04 3
A 2022-01-06 1
B 2022-01-05 4

How can I find out the rolling sum of sold for past 4 days for each product in bigquery sql? result looking for:

Product date rolling sum
A 2022-01-01 2
A 2022-01-02 2
A 2022-01-03 2
A 2022-01-04 5
A 2022-01-05 3
A 2022-01-06 4
B 2022-01-01 0
B 2022-01-02 0
B 2022-01-03 0
B 2022-01-04 0
B 2022-01-05 4
B 2022-01-06 4

The major issue is there is no record if there is no product sold.

First you have to build full matrix from dates and Products using CROSS JOIN . Then LEFT JOIN your origin data. And then just calculate rolling sums of expected window ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

WITH t AS
(
    SELECT "A" Product, DATE("2022-01-01") date, 2 sold
    UNION ALL 
    SELECT "A" Product, DATE("2022-01-04") date, 3 sold
    UNION ALL 
    SELECT "A" Product, DATE("2022-01-06") date, 1 sold
    UNION ALL 
    SELECT "B" Product, DATE("2022-01-05") date, 4 sold
)
SELECT *,
SUM(IFNULL(sold, 0)) OVER (PARTITION BY Product ORDER BY date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) rolling_sum
FROM UNNEST(
    GENERATE_DATE_ARRAY(
        (SELECT MIN(date)  FROM t), 
        (SELECT MAX(date)  FROM t), 
        INTERVAL 1 day
        )
    ) date
CROSS JOIN (
    SELECT DISTINCT Product FROM t
)
LEFT JOIN t USING (Product, date)

Don't forget about NULL s for days without sales

You'd have to prepare a table with all dates (generate_date_array()) and products (cross join unique products) first, then left join the actual data and fill it with a windowed sum:

WITH t as (

    select * from unnest([
        struct('A' as Product, date '2022-01-01' as date, 2 as sold)
        ,('A',date '2022-01-04', 3)
        ,('A',date '2022-01-06', 1)
        ,('B',date '2022-01-05', 4)
    ])
)

SELECT *  
    ,ifnull( sum(sold) over (partition by Product order by date asc),0) as rolling_sum
FROM 
  UNNEST(generate_date_array(date '2022-01-01', date '2022-01-06')) as date
CROSS JOIN 
  (select distinct Product from t)
LEFT JOIN 
  t using(date, Product)

I hard-coded the dates for better readability. If you have a lot of data I'd add more where clauses to the subqueries and put dates into variables to keep everything performant.

Here is my take, I make use of sum and Generate Array :

with time_frame as (
    select * from unnest(GENERATE_DATE_ARRAY(DATE('2022-01-01'), DATE('2022-01-06'), INTERVAL 1 DAY)) as days
), working_table as (
    select 'A' as id, days, DATE_ADD(days, INTERVAL -4 DAY) as delay from time_frame union all
    select 'B' as id, days, DATE_ADD(days, INTERVAL -4 DAY) as delay from time_frame
), match_data as (
    select 'A' as id, DATE('2022-01-01') as date_val, 2 as value union all
    select 'A',DATE('2022-01-04'),3 union all
    select 'A',DATE('2022-01-06'),1 union all
    select 'B',DATE('2022-01-05'),4 
), clean as (
    select tf.id,tf.days, 
    i.value,tf.delay,
     from working_table tf
    left join match_data i on tf.days = i.date_val and tf.id = i.id 
) 
select c.id,c.days,
(SELECT SUM(trueval) from (select IFNULL(SUM(value),0) trueval from clean where days <= c.days and days > c.delay and id=c.id))
from clean c

Just updated to fit your scope and constraints.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM