[英]Full Outer Self join with data for different date
我想将值与第二天的值进行比较(还想查看哪种颜色是新的或未出现的)。
我已经完成了完全外部自连接并替换了“is_matched”右侧部分中的空值。 Is_matched 向我们展示了连接是否有效或右侧的部分在没有合并的情况下为空。
唯一的事情是最后一列“this_is_not_working”。 它应该具有“date_local2”的 total_colours 值,而不是“date_local”的值,我无法弄清楚如何用“this_is_not_working”列中的值替换所有空值。 我尝试过窗口函数和间隔,但并没有真正奏效。
我已经使用 Postgres 创建了这个db fiddle,但我使用的是 Presto。
select * from colours
date_local | colour | amount :--------- | :----- | -----: 2020-01-01 | white | 10 2020-01-01 | white | 10 2020-01-01 | green | 20 2020-01-01 | white | 10 2020-01-01 | red | 25 2020-01-01 | white | 10 2020-01-02 | pink | 15 2020-01-02 | pink | 15 2020-01-02 | pink | 15 2020-01-02 | pink | 15 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-03 | pink | 15 2020-01-03 | pink | 15 2020-01-03 | pink | 15 2020-01-03 | green | 20 2020-01-03 | green | 20 2020-01-03 | green | 20
with a as(
select
*
,sum(colours) over(partition by date_local) as total_colour
from (
select
date_local
,colour
,count(colour) as colours
,sum(amount) as amount
from colours
group by 1,2
) as fr_om
)
select
a.*
,b.date_local as is_matched
,coalesce(b.date_local, a.date_local + interval '1' day) as date_local_2
,coalesce(b.colour, a.colour) as colour_2
,coalesce(b.colours, 0) as colour_2
,coalesce(b.amount, 0) as amount_2
,coalesce(b.colours - a.colours, a.colours) as colour_difference
,coalesce(b.amount - a.amount, a.amount) as amount_difference
,b.total_colour as this_is_not_working
from a
full outer join a as b
on a.date_local = b.date_local - interval '1' day
and a.colour = b.colour
order by 1
date_local | colour | colours | amount | total_colour | is_matched | date_local_2 | colour_2 | colour_2 | amount_2 | colour_difference | amount_difference | this_is_not_working :--------- | :----- | ------: | -----: | -----------: | :--------- | :------------------ | :------- | -------: | -------: | ----------------: | ----------------: | ------------------: 2020-01-01 | red | 1 | 25 | 6 | null | 2020-01-02 00:00:00 | red | 0 | 0 | 1 | 25 | null 2020-01-01 | green | 1 | 20 | 6 | null | 2020-01-02 00:00:00 | green | 0 | 0 | 1 | 20 | null 2020-01-01 | white | 4 | 40 | 6 | 2020-01-02 | 2020-01-02 00:00:00 | white | 5 | 50 | 1 | 10 | 9 2020-01-02 | pink | 4 | 60 | 9 | 2020-01-03 | 2020-01-03 00:00:00 | pink | 3 | 45 | -1 | -15 | 6 2020-01-02 | white | 5 | 50 | 9 | null | 2020-01-03 00:00:00 | white | 0 | 0 | 5 | 50 | null 2020-01-03 | pink | 3 | 45 | 6 | null | 2020-01-04 00:00:00 | pink | 0 | 0 | 3 | 45 | null 2020-01-03 | green | 3 | 60 | 6 | null | 2020-01-04 00:00:00 | green | 0 | 0 | 3 | 60 | null null | null | null | null | null | 2020-01-02 | 2020-01-02 00:00:00 | pink | 4 | 60 | null | null | 9 null | null | null | null | null | 2020-01-01 | 2020-01-01 00:00:00 | white | 4 | 40 | null | null | 6 null | null | null | null | null | 2020-01-03 | 2020-01-03 00:00:00 | green | 3 | 60 | null | null | 6 null | null | null | null | null | 2020-01-01 | 2020-01-01 00:00:00 | green | 1 | 20 | null | null | 6 null | null | null | null | null | 2020-01-01 | 2020-01-01 00:00:00 | red | 1 | 25 | null | null | 6
我不认为你需要一个full join
。 窗口函数可以完成工作:
select
date_local,
colour,
no_colour,
sum_amount,
total_colour,
is_matched,
case when is_matched = 1
then lead(date_local) over(partition by colour order by date_local)
end date_local_2,
case when is_matched = 1
then lead(colour) over(partition by colour order by date_local)
end colour_2,
case when is_matched = 1
then lead(no_colour) over(partition by colour order by date_local)
end no_colour_2,
case when is_matched = 1
then lead(sum_amount) over(partition by colour order by date_local)
end sum_amount_2,
case when is_matched = 1
then lead(total_colour) over(partition by colour order by date_local)
end total_colour_2
from (
select
date_local,
colour,
count(*) no_colour,
sum(amount) sum_amount,
case when lead(date_local) over(partition by colour order by date_local)
= date_local + interval '1' day
then 1
end is_matched,
sum(count(*)) over(partition by date_local) total_colour
from colours
group by date_local, colour
) t
order by date_local, colour
内部查询按天和颜色聚合,并计算组级指标以及每天的总记录数; 它还设置一个标志,指示第二天是否存在相同颜色的“相邻”记录。
然后,外部查询使用窗口函数lead()
从相邻行中恢复值。
在您的 db fiddle 中,这会产生:
date_local | colour | no_colour | sum_amount | total_colour | is_matched | date_local_2 | colour_2 | no_colour_2 | sum_amount_2 | total_colour_2 :--------- | :----- | --------: | ---------: | -----------: | ---------: | :----------- | :------- | ----------: | -----------: | -------------: 2020-01-01 | green | 1 | 20 | 6 | null | null | null | null | null | null 2020-01-01 | red | 1 | 25 | 6 | null | null | null | null | null | null 2020-01-01 | white | 4 | 40 | 6 | 1 | 2020-01-02 | white | 5 | 50 | 9 2020-01-02 | pink | 4 | 60 | 9 | 1 | 2020-01-03 | pink | 3 | 45 | 6 2020-01-02 | white | 5 | 50 | 9 | null | null | null | null | null | null 2020-01-03 | green | 3 | 60 | 6 | null | null | null | null | null | null 2020-01-03 | pink | 3 | 45 | 6 | null | null | null | null | null | null
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.