[英]BigQuery SQL - Concatenate two columns if they are on consecutive days
[英]Calculate Number of Consecutive Days Where a Condition Applies Across Two Columns
我有一个类似于下面的表格:
+-------------------------+
¦ ID ¦ Date ¦ Balance ¦
¦----+----------+---------¦
¦ A ¦ 20200620 ¦ 150 ¦
¦ A ¦ 20200621 ¦ -130 ¦
¦ A ¦ 20200621 ¦ -140 ¦
¦ A ¦ 20200621 ¦ -200 ¦
¦ A ¦ 20200622 ¦ 200 ¦
¦ A ¦ 20200622 ¦ 300 ¦
¦ B ¦ 20200621 ¦ 350 ¦
¦ B ¦ 20200621 ¦ 400 ¦
¦ B ¦ 20200621 ¦ -150 ¦
¦ B ¦ 20200622 ¦ -200 ¦
¦ B ¦ 20200622 ¦ -300 ¦
¦ B ¦ 20200623 ¦ -400 ¦
¦ B ¦ 20200623 ¦ -500 ¦
+-------------------------+
我需要计算到达 ID 和每个不同日期(包括计算中的日期本身)的“余额 <0”的连续天数。 每个 Id 在给定日期可能有多个余额,无论是正数还是负数。 即使给定日期的一个余额金额为负数,查询也应将这一天考虑在内。 output 结果应该类似于下表:
+--------------------------------------------+
¦ ID ¦ Date ¦ Number_of_Consecutive_Days ¦
¦----+----------+----------------------------¦
¦ A ¦ 20200620 ¦ Null ¦
¦----+----------+----------------------------¦
¦ A ¦ 20200621 ¦ 1 ¦
¦----+----------+----------------------------¦
¦ A ¦ 20200622 ¦ 1 ¦
¦----+----------+----------------------------¦
¦ B ¦ 20200621 ¦ Null ¦
¦----+----------+----------------------------¦
¦ B ¦ 20200622 ¦ 2 ¦
¦----+----------+----------------------------¦
¦ B ¦ 20200623 ¦ 3 ¦
+--------------------------------------------+
你能建议我一种计算方法吗? 这是高度赞赏。
这是一种与过滤有关的间隙和孤岛问题。 这是一种方法:
select t.*,
row_number() over (partition by id, dateadd(day, - seqnum, date)
order by date
) as Number_of_Consecutive_Days
from (select t.id, date, min(balance) as min_balance,
row_number() over (partition by id order by date) as seqnum
from t
group by t.id, date
having min(balance) < 0
) t;
这仅通过使用负余额的日子来起作用。 然后从日期中减去一个序号。 对于相邻的日子,这是恒定的 - 因此外部row_number()
的差异。
编辑:
如果您只想计算到给定日期为止任何负余额的天数,您可以使用:
select t.*,
sum(case when min_balance < 0 then 1 else 0 end) over (partition by id order by date) as Number_of_Consecutive_Days
from (select t.id, date, min(balance) as min_balance
from t
group by t.id, date
) t;
请注意,日期之间的任何间隔都将被视为连续天数。
with data as (
select id, date,
case when min(balance) >= 0 then 0 else 1 end as tally,
sum(case when min(balance) >= 0 then 1 else 0 end)
over (partition by id order by date) as grp
from t
group by id, date
)
select id, date,
sum(tally) over (partition by id, grp, tally order by date) as running_days
from data
order by id, date;
要将缺少的日期视为非连续尝试:
sum(case when min(balance) >= 0 then 1 else 0 end)
over (partition by id order by date) +
datediff(day, min(date) over (partition by id), date) -
row_number() over (partition by id order by date) + 1 as grp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.