[英]Fill NULL rows based on some mathematical operations
I have a table A which contains id
and report_day
and other columns.我有一个表A ,其中包含
id
和report_day
以及其他列。 Also I've a table B which contains also id
, report_day
and also subscribers
.此外,我还有一个表B ,其中还包含
id
、 report_day
和subscribers
。 I want to create a VIEW with id, report_day, subscribers
columns.我想创建一个带有
id, report_day, subscribers
列的 VIEW。 So it's a simple join:所以这是一个简单的连接:
select a.id, a.report_day, b.subscribers from schema.a
left join schema.b on a.id = b.id
and a.report_day = b.report_day
Now i want to add column subscribers_increment
based on subscribers
.现在我想根据
subscribers
添加列subscribers_increment
者增量。 But for some days I don't have stats for subscribers
column and it's set to NULL
.但是有几天我没有
subscribers
列的统计信息,它设置为NULL
。 subcribers_increment
it's just a (subcribers(current_day) - subscribers (prev_day). subcribers_increment
它只是一个(subcribers(current_day) - 订阅者 (prev_day)。
I read some articles and add next statement:我阅读了一些文章并添加了下一条语句:
case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment
And now I've next result:现在我有了下一个结果:
NULL
is still NULL
. NULL
仍然是NULL
。 For example it has incorrect increment
for 2021-04-07
.例如,它的
2021-04-07
increment
不正确。 It's increment for 2 days.这是2天的增量。 Can i divide this value from
2021-04-08
by numbers of days (here it's 2) and write same value for 2021-04-07
and 2021-04-08
(or at least for 2021-04-07
where it was null)?我可以将此值从
2021-04-08
除以天数(这里是 2)并为2021-04-07
和2021-04-08
写入相同的值(或至少对于2021-04-07
它为空)? And same logic for all days where subscribers
is null?对于
subscribers
为 null 的所有日子,同样的逻辑?
So i need to follow next rules: If I see NULL
value in subcribers
column I should go for the next (future) NOT NULL day and grab value for this next day.所以我需要遵循下一个规则:如果我在
subcribers
列中看到NULL
值,我应该为下一个(未来)go 而不是第二天的 NULL 值并获取值。 Substract from this (feature) value last not null value (past - order by date, so we looping back).从这个(特征)值中减去最后不是 null 值(过去 - 按日期排序,所以我们循环返回)。 Divide result of substraction by number of days and fill these rows for column
subcribers_increment
.将减法结果除以天数,并为
subcribers_increment
列填充这些行。
Is it possible?可能吗?
UPDATE:更新:
For my data it shoud look like this:对于我的数据,它应该如下所示:
UPDATE v2更新 v2
UPDATE v3更新 v3
case (our increment) 25.03-27.03 still is NULL案例(我们的增量)25.03-27.03 仍然是 NULL
The basic idea is:基本思想是:
lag()
to get the previous subscribers
and dates before joining.lag()
在加入之前获取以前的subscribers
和日期。 This assumes that the left join
is the cause of all the NULL
values.left join
是所有NULL
值的原因。NULL
is combined with the next value in one grouping.NULL
与一个分组中的下一个值组合。NULL
s in a group is the denominatorNULL
的计数是分母subscribers
and prev_subscribers
is the numerator.subscribers
和prev_subscribers
之间的区别是分子。 So the idea is:所以这个想法是:
with t as (
select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
from first_table a left join
(select b.*,
lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
from second_table b
) b
on a.id = b.id and a.report_day = b.report_day
)
select t.*,
(case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
then t.subscribers - t.prev_subscribers
when t.subscribers is not null
then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
when t.subscribers is null
then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
) / count(*) over (partition by id, grp)
end)
from t;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.