简体   繁体   English

根据一些数学运算填充 NULL 行

[英]Fill NULL rows based on some mathematical operations

I have a table A which contains id and report_day and other columns.我有一个表A ,其中包含idreport_day以及其他列。 Also I've a table B which contains also id , report_day and also subscribers .此外,我还有一个表B ,其中还包含idreport_daysubscribers I want to create a VIEW with id, report_day, subscribers columns.我想创建一个带有id, report_day, subscribers列的 VIEW。 So it's a simple join:所以这是一个简单的连接:

select a.id, a.report_day, b.subscribers  from schema.a
left join schema.b on a.id = b.id 
and a.report_day = b.report_day 

看法

Now i want to add column subscribers_increment based on subscribers .现在我想根据subscribers添加列subscribers_increment者增量。 But for some days I don't have stats for subscribers column and it's set to NULL .但是有几天我没有subscribers列的统计信息,它设置为NULL subcribers_increment it's just a (subcribers(current_day) - subscribers (prev_day). subcribers_increment它只是一个(subcribers(current_day) - 订阅者 (prev_day)。

I read some articles and add next statement:我阅读了一些文章并添加了下一条语句:

case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment

And now I've next result:现在我有了下一个结果:

增量

NULL is still NULL . NULL仍然是NULL For example it has incorrect increment for 2021-04-07 .例如,它的2021-04-07 increment不正确。 It's increment for 2 days.这是2天的增量。 Can i divide this value from 2021-04-08 by numbers of days (here it's 2) and write same value for 2021-04-07 and 2021-04-08 (or at least for 2021-04-07 where it was null)?我可以将此值从2021-04-08除以天数(这里是 2)并为2021-04-072021-04-08写入相同的值(或至少对于2021-04-07它为空)? And same logic for all days where subscribers is null?对于subscribers为 null 的所有日子,同样的逻辑?

So i need to follow next rules: If I see NULL value in subcribers column I should go for the next (future) NOT NULL day and grab value for this next day.所以我需要遵循下一个规则:如果我在subcribers列中看到NULL值,我应该为下一个(未来)go 而不是第二天的 NULL 值并获取值。 Substract from this (feature) value last not null value (past - order by date, so we looping back).从这个(特征)值中减去最后不是 null 值(过去 - 按日期排序,所以我们循环返回)。 Divide result of substraction by number of days and fill these rows for column subcribers_increment .将减法结果除以天数,并为subcribers_increment列填充这些行。

Is it possible?可能吗?

UPDATE:更新:

For my data it shoud look like this:对于我的数据,它应该如下所示:

在此处输入图像描述

UPDATE v2更新 v2

After applying script:应用脚本后: 在此处输入图像描述

UPDATE v3更新 v3

case (our increment) 25.03-27.03 still is NULL案例(我们的增量)25.03-27.03 仍然是 NULL 在此处输入图像描述

The basic idea is:基本思想是:

  1. Use lag() to get the previous subscribers and dates before joining.使用lag()在加入之前获取以前的subscribers和日期。 This assumes that the left join is the cause of all the NULL values.这假定left join是所有NULL值的原因。
  2. Use a cumulative count in reverse to assign a grouping so NULL is combined with the next value in one grouping.反向使用累积计数来分配分组,以便NULL与一个分组中的下一个值组合。
  3. As a result of (2), the count of NULL s in a group is the denominator由于(2),一个组中NULL的计数是分母
  4. As a result of (1) the difference between subscribers and prev_subscribers is the numerator.由于 (1), subscribersprev_subscribers之间的区别是分子。
  5. The actual calculation requires more window functions and case logic.实际计算需要更多的 window 函数和案例逻辑。

So the idea is:所以这个想法是:

with t as (
      select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
             count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
      from first_table a left join
           (select b.*,
                   lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
                   lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
            from second_table b
           ) b
           on a.id = b.id and a.report_day = b.report_day 
     )
select t.*,
       (case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
             then t.subscribers - t.prev_subscribers
             when t.subscribers is not null
             then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
             when t.subscribers is null
             then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
                  ) / count(*) over (partition by id, grp)
        end)
from t;

Here is a db<>fiddle. 是一个 db<>fiddle。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM