简体   繁体   中英

How to get difference in aggregated value over a certain period of time using windows function in bigquery sql?

date_key cust_id sales
2022-01-01 1 30
2022-01-02 1 35
2022-01-05 1 38
2022-01-10 1 20
2022-01-11 1 35
2022-01-01 2 20
2022-01-02 2 25
2022-01-04 2 38
2022-01-09 2 20
2022-01-15 1 35
2022-01-11 3 35

I would like to get all customer_ids in the current period and left join the difference in sum(sales) between period 2022-01-01 -2022-01-05 and sum(sales) from period 2022-01-06 - 2022-01-11.

How would you achieve this in windows function? Currently I am using ctes

with 
users as(
 select 
  distinct cust_id 
 from 
  tableSales 
  where date_key between date('2022-01-06) and date('2022-01-11)),
currentPeriod as(
 select
  distinct cust_id
  ,sum(sales) sales
 from users
  left join tableSales using (customer_id)
  where date_key between date('2022-01-06) and date('2022-01-11)
),
previousPeriod as(
 select
  distinct cust_id
  ,sum(sales) sales
 from users
 left join tableSales using (customer_id)
 where date_key between date('2022-01-05) and date('2022-01-01)
)
#-----------------------
Select 
 distinct cust_id 
 ,cp.sales - pp.sales deltaSales
 from users
left join currentperiod cp using(customer_id)
left join previousperiod pp using(customer_id)

There must be a shorter way to achieve this using windows function? Please do help.

In your query there are missing quotations ' the fiels customer_id and cust_id should be the same, right?

The dates are switched: between date('2022-01-05) and date('2022-01-01)

The given time intervals are strange, because it is unclear, why the user needs them.

With window function:

with tableSales as 
(Select date_sub(date("2022-01-11"), interval cast(rand()*10 as int64) day ) date_key, cust_id,
cast(rand()*100 as int64) as sales
from unnest([1,2,3]) cust_id, unnest(generate_array(1,10,1)) a
)
,tmp as 
(Select *,
sum(if(date_key between date('2022-01-06') and date('2022-01-11'), sales ,0 ) ) over (partition by cust_id) as currentperiod ,
sum(if(date_key between date('2022-01-01') and date('2022-01-05'), sales ,0 ) ) over (partition by cust_id) as previousperiod 
 from tableSales
 )
 Select distinct cust_id, currentperiod, previousperiod from tmp

Well, doing a ´group by` is much better:

with tableSales as 
(Select date_sub(date("2022-01-11"), interval cast(rand()*10 as int64) day ) date_key, cust_id,
cast(rand()*100 as int64) as sales
from unnest([1,2,3]) cust_id, unnest(generate_array(1,10,1)) a
)
,tmp as 
(Select cust_id,
sum(if(date_key between date('2022-01-06') and date('2022-01-11'), sales ,0 ) )   currentperiod ,
sum(if(date_key between date('2022-01-01') and date('2022-01-05'), sales ,0 ) )   previousperiod 
 from tableSales
 group by 1
 )
 Select distinct cust_id, currentperiod, previousperiod from tmp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM