I've got some data in a Postgres table that looks like:
Name | Date | Balance
--------------------------
A |2020-01-01 | 1
B |2020-01-01 | 0
B |2020-01-02 | 2
A |2020-01-03 | 5
(note that A
is missing a value for 2020-01-02
and B
for 2020-01-03
)
I'd like to fill in the missing date with it's most recent value for that name. In other words, I'd like
Name | Date | Balance
--------------------------
A |2020-01-01 | 1
B |2020-01-01 | 0
A |2020-01-02 | 1 <--- filled in with previous balance
B |2020-01-02 | 2
A |2020-01-03 | 5
B |2020-01-03 | 2 <--- filled in with previous balance
Note that in reality, several dates may be missing in a row, in which case the most recent one for that name should always be selected.
I am thinking generate_series()
and window functions:
select
n.name,
s.date,
coalesce(t.balance, lag(balance) over(partition by n.name order by s.date) balance
from (select generate_series(min(date), max(date), interval '1 day') date from mytable) s
cross join (select distinct name from mytable) n
left join mytable t on t.name = n.name and t.date = s.date
order by n.name, s.date
If you may have several missing dates in a row, then a little more logic is needed - this basically emulates lag()
with the ignore nulls
option:
select
name,
date,
coalesce(balance, first_value(balance) over(partition by name, grp)) balance
from (
select
n.name,
s.date,
t.balance,
sum( (t.balance is not null)::int ) over(partition by n.name order by s.date) grp
from (select generate_series(min(date), max(date), interval '1 day') date from mytable) s
cross join (select distinct name from mytable) n
left join mytable t on t.name = n.name and t.date = s.date
) t
order by name, date
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.