简体   繁体   English

如何在PostgreSQL / Vertica中基于日期范围进行间隙填充和插值?

[英]how to do gap filling and interpolation based on date range in postgresql/vertica?

I have a day wise data (num) against some dimension (ie cnt and cnt_id) I want to interpolate date, dimensions(ie cnt and cnt_id) as well as cumulative_num 我有一个针对某个维度(例如cnt和cnt_id)的每日数据(数字),我想插入日期,维度(例如cnt和cnt_id)以及cumulative_num

my input set has data for only 3-dates, and I have fixed date range against which I want to do gap-fill 我的输入集仅包含3个日期的数据,并且我有固定的日期范围,我想针对该日期范围进行填充

fixed date-range = from 2017-01-01 to 2017-01-08 固定的日期范围=从2017-01-01至2017-01-08

Ref. 参考 SQL to generate data SQL生成数据

WITH temp_data AS (
SELECT '2017-01-03'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 10::int AS numbers, 10::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 20::int AS numbers, 30::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 40::int AS numbers, 70::int AS cumulative_num
UNION
SELECT '2017-01-03'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 100::int AS numbers, 100::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 200::int AS numbers, 300::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 500::int AS numbers, 800::int AS cumulative_num
)
SELECT * FROM temp_data ORDER BY cnt_id, e_date

My input data is like following 我的输入数据如下

e_date     cnt cnt_id numbers cumulative_num 
---------- --- ------ ------- -------------- 
2017-01-03 uk  1      10      10             
2017-01-05 uk  1      20      30             
2017-01-07 uk  1      40      70             
2017-01-03 fr  2      100     100            
2017-01-05 fr  2      200     300            
2017-01-07 fr  2      500     800            
...        ..  ..     ..      ...            

My expected result is like following 我的预期结果如下

 e_date     cnt cnt_id num cumulative_num 
---------- --- ------ --- -------------- 
2017-01-01 uk  1      0   0              
2017-01-02 uk  1      0   0              
2017-01-03 uk  1      10  10             
2017-01-04 uk  1      0   10             
2017-01-05 uk  1      20  30             
2017-01-06 uk  1      0   30             
2017-01-07 uk  1      40  70             
2017-01-08 uk  1      0   70             
2017-01-01 fr  2      0   0              
2017-01-02 fr  2      0   0              
2017-01-03 fr  2      100 100            
2017-01-04 fr  2      0   100            
2017-01-05 fr  2      200 300            
2017-01-06 fr  2      0   300            
2017-01-07 fr  2      500 800            
2017-01-08 fr  2      0   800     

Note: I am tagging both postgresql and vertica as they both follow almost same sql syntax standards. 注意:我正在标记postgresql和vertica,因为它们都遵循几乎相同的sql语法标准。 solutions in any of the db is preferable. 在任何数据库中的解决方案都是可取的。

I think this is what are you looking for - gives exactly what you show as desired output - at least you can use it as starting point for your query. 我认为这就是您要寻找的-准确地提供所需的输出-至少可以将其用作查询的起点。 Because I think cumulative_num should actually be calculated not taken from temp data: 因为我认为不应该从temp数据中获取accumulated_num,所以:

WITH temp_data AS (
SELECT '2017-01-03'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 10::int AS numbers, 10::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 20::int AS numbers, 30::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'uk'::VARCHAR AS cnt, 1::int AS cnt_id, 40::int AS numbers, 70::int AS cumulative_num
UNION
SELECT '2017-01-03'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 100::int AS numbers, 100::int AS cumulative_num
UNION
SELECT '2017-01-05'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 200::int AS numbers, 300::int AS cumulative_num
UNION
SELECT '2017-01-07'::DATE AS e_date, 'fr'::VARCHAR AS cnt, 2::int AS cnt_id, 500::int AS numbers, 800::int AS cumulative_num
)
select e_date, cnt, cnt_id, numbers, max(cumulative_num) over (partition by cnt_id order by e_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_num
from (
SELECT t.my_date::date as e_date, c.cnt, c.cnt_id, coalesce(tmp.numbers,0) as numbers, coalesce(tmp.cumulative_num, 0) as cumulative_num 
FROM generate_series('2017-01-01'::date, '2017-01-08'::date, '1day'::interval) as t(my_date)
cross join (select distinct cnt, cnt_id from temp_data) c
left join temp_data tmp on t.my_date=tmp.e_date and c.cnt_id=tmp.cnt_id
ORDER BY cnt_id, e_date
) src    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM