
[英]Bigquery SQL : When to use Aggregated Function over Analytics Function over Subquery?
[英]How to get difference in aggregated value over a certain period of time using windows function in bigquery sql?
日期键 | cust_id | 销售量 |
---|---|---|
2022-01-01 | 1 | 30 |
2022-01-02 | 1 | 35 |
2022-01-05 | 1 | 38 |
2022-01-10 | 1 | 20 |
2022-01-11 | 1 | 35 |
2022-01-01 | 2 | 20 |
2022-01-02 | 2 | 25 |
2022-01-04 | 2 | 38 |
2022-01-09 | 2 | 20 |
2022-01-15 | 1 | 35 |
2022-01-11 | 3 | 35 |
我想获取当前期间的所有 customer_ids,然后加入 2022-01-01 -2022-01-05 期间的总和(销售额)与 2022-01-06 至 2022-01 期间的总和(销售额)之间的差异-11。
您将如何在 windows function 中实现这一点? 目前我正在使用 ctes
with
users as(
select
distinct cust_id
from
tableSales
where date_key between date('2022-01-06) and date('2022-01-11)),
currentPeriod as(
select
distinct cust_id
,sum(sales) sales
from users
left join tableSales using (customer_id)
where date_key between date('2022-01-06) and date('2022-01-11)
),
previousPeriod as(
select
distinct cust_id
,sum(sales) sales
from users
left join tableSales using (customer_id)
where date_key between date('2022-01-05) and date('2022-01-01)
)
#-----------------------
Select
distinct cust_id
,cp.sales - pp.sales deltaSales
from users
left join currentperiod cp using(customer_id)
left join previousperiod pp using(customer_id)
使用 windows function 必须有更短的方法来实现这一点? 请帮忙。
在您的查询中缺少引号'
字段customer_id
和cust_id
应该相同,对吗?
日期切换: between date('2022-01-05) and date('2022-01-01)
给定的时间间隔很奇怪,因为不清楚用户为什么需要它们。
使用window
function:
with tableSales as
(Select date_sub(date("2022-01-11"), interval cast(rand()*10 as int64) day ) date_key, cust_id,
cast(rand()*100 as int64) as sales
from unnest([1,2,3]) cust_id, unnest(generate_array(1,10,1)) a
)
,tmp as
(Select *,
sum(if(date_key between date('2022-01-06') and date('2022-01-11'), sales ,0 ) ) over (partition by cust_id) as currentperiod ,
sum(if(date_key between date('2022-01-01') and date('2022-01-05'), sales ,0 ) ) over (partition by cust_id) as previousperiod
from tableSales
)
Select distinct cust_id, currentperiod, previousperiod from tmp
好吧,做一个“分组”要好得多:
with tableSales as
(Select date_sub(date("2022-01-11"), interval cast(rand()*10 as int64) day ) date_key, cust_id,
cast(rand()*100 as int64) as sales
from unnest([1,2,3]) cust_id, unnest(generate_array(1,10,1)) a
)
,tmp as
(Select cust_id,
sum(if(date_key between date('2022-01-06') and date('2022-01-11'), sales ,0 ) ) currentperiod ,
sum(if(date_key between date('2022-01-01') and date('2022-01-05'), sales ,0 ) ) previousperiod
from tableSales
group by 1
)
Select distinct cust_id, currentperiod, previousperiod from tmp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.