根据一些数学运算填充 NULL 行

Question

I have a table A which contains id and report_day and other columns.我有一个表A ，其中包含id和report_day以及其他列。 Also I've a table B which contains also id , report_day and also subscribers .此外，我还有一个表B ，其中还包含id 、 report_day和subscribers 。 I want to create a VIEW with id, report_day, subscribers columns.我想创建一个带有id, report_day, subscribers列的 VIEW。 So it's a simple join:所以这是一个简单的连接：

select a.id, a.report_day, b.subscribers  from schema.a
left join schema.b on a.id = b.id 
and a.report_day = b.report_day

Now i want to add column subscribers_increment based on subscribers .现在我想根据subscribers添加列subscribers_increment者增量。 But for some days I don't have stats for subscribers column and it's set to NULL .但是有几天我没有subscribers列的统计信息，它设置为NULL 。 subcribers_increment it's just a (subcribers(current_day) - subscribers (prev_day). subcribers_increment它只是一个(subcribers(current_day) - 订阅者 (prev_day)。

I read some articles and add next statement:我阅读了一些文章并添加了下一条语句：

case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment

And now I've next result:现在我有了下一个结果：

NULL is still NULL . NULL仍然是NULL 。 For example it has incorrect increment for 2021-04-07 .例如，它的2021-04-07 increment不正确。 It's increment for 2 days.这是2天的增量。 Can i divide this value from 2021-04-08 by numbers of days (here it's 2) and write same value for 2021-04-07 and 2021-04-08 (or at least for 2021-04-07 where it was null)?我可以将此值从2021-04-08除以天数（这里是 2）并为2021-04-07和2021-04-08写入相同的值（或至少对于2021-04-07它为空)? And same logic for all days where subscribers is null?对于subscribers为 null 的所有日子，同样的逻辑？

So i need to follow next rules: If I see NULL value in subcribers column I should go for the next (future) NOT NULL day and grab value for this next day.所以我需要遵循下一个规则：如果我在subcribers列中看到NULL值，我应该为下一个（未来）go 而不是第二天的 NULL 值并获取值。 Substract from this (feature) value last not null value (past - order by date, so we looping back).从这个（特征）值中减去最后不是 null 值（过去 - 按日期排序，所以我们循环返回）。 Divide result of substraction by number of days and fill these rows for column subcribers_increment .将减法结果除以天数，并为subcribers_increment列填充这些行。

Is it possible?可能吗？

UPDATE:更新：

For my data it shoud look like this:对于我的数据，它应该如下所示：

UPDATE v2更新 v2

After applying script:应用脚本后：

UPDATE v3更新 v3

case (our increment) 25.03-27.03 still is NULL案例（我们的增量）25.03-27.03 仍然是 NULL

Answer 1

The basic idea is:基本思想是：

Use lag() to get the previous subscribers and dates before joining.使用lag()在加入之前获取以前的subscribers和日期。 This assumes that the left join is the cause of all the NULL values.这假定left join是所有NULL值的原因。
Use a cumulative count in reverse to assign a grouping so NULL is combined with the next value in one grouping.反向使用累积计数来分配分组，以便NULL与一个分组中的下一个值组合。
As a result of (2), the count of NULL s in a group is the denominator由于（2），一个组中NULL的计数是分母
As a result of (1) the difference between subscribers and prev_subscribers is the numerator.由于 (1)， subscribers和prev_subscribers之间的区别是分子。
The actual calculation requires more window functions and case logic.实际计算需要更多的 window 函数和案例逻辑。

So the idea is:所以这个想法是：

with t as (
      select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
             count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
      from first_table a left join
           (select b.*,
                   lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
                   lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
            from second_table b
           ) b
           on a.id = b.id and a.report_day = b.report_day 
     )
select t.*,
       (case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
             then t.subscribers - t.prev_subscribers
             when t.subscribers is not null
             then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
             when t.subscribers is null
             then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
                  ) / count(*) over (partition by id, grp)
        end)
from t;

Here is a db<>fiddle. 这是一个 db<>fiddle。

根据一些数学运算填充 NULL 行

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-11 13:31:01

根据一些数学运算填充 NULL 行

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-11 13:31:01

解决方案1
2 已采纳 2021-04-11 13:31:01