I have database that tracked person salary over time like table below:
I want to query the person (based on id) salary each month to give output like the table below
I don't know what query to use since it needs to iterate in the salary database to check what is the valid salary for certain date. Any idea for this?
Thanks!
Here you have all with the example data. As already shown, you need a valid-from-date...
WITH
-- your input ...
indata(id,datevaliduntil,salary) AS (
SELECT 1001,DATE '9999-12-31', 5000
UNION ALL SELECT 1001,DATE '2020-08-31', 4000
UNION ALL SELECT 1001,DATE '2020-04-30', 3000
)
,
-- make it almost like a slowly changing dimension
-- table - ad a valid-from-date ...
scd AS (
SELECT
id
, LAG(datevaliduntil,1,DATE '1900-01-01') OVER (
PARTITION BY id ORDER BY datevaliduntil
) AS datevalidfrom
, datevaliduntil
, salary
FROM indata
)
,
-- the months from the example ...
months(monthend) AS (
SELECT
mon::DATE - 1 AS monthend
FROM
GENERATE_SERIES(
'2020-04-01'::DATE
, '2021-03-01'::DATE
, INTERVAL '1 MONTH'
) gs(mon)
)
SELECT
monthend
, id
, salary
FROM scd
JOIN months ON monthend > datevalidfrom
AND monthend <= datevaliduntil
ORDER BY 1
;
-- out monthend | id | salary
-- out ------------+------+--------
-- out 2020-03-31 | 1001 | 3000
-- out 2020-04-30 | 1001 | 3000
-- out 2020-05-31 | 1001 | 4000
-- out 2020-06-30 | 1001 | 4000
-- out 2020-07-31 | 1001 | 4000
-- out 2020-08-31 | 1001 | 4000
-- out 2020-09-30 | 1001 | 5000
-- out 2020-10-31 | 1001 | 5000
-- out 2020-11-30 | 1001 | 5000
-- out 2020-12-31 | 1001 | 5000
-- out 2021-01-31 | 1001 | 5000
-- out 2021-02-28 | 1001 | 5000
This is a convenient place to use a lateral join. The following goes by the first day of the month rather than the last day -- because that is simpler to generate:
select i.id, gs.mon, s.salary
from generate_series('2019-01-01'::date, '2020-12-01'::date, interval '1 month') gs(mon) cross join
(select distinct id from salaries) i left join lateral
(select s.salary
from salaries s
where s.id = i.id and s.datevaliduntil >= gs.mon
order by s.datevaliduntil asc
limit 1
) s;
Of course, you can just subtract 1 day from each date if you want the last day.
I would use a lateral join, but the other way around: start from the table itself, bring the previous date with lag()
, then use generate series to generate the dates in between. A little bit of additional logic is needed to adjust the end of months:
select x.date - interval '1 day' date, t.id, t.salary
from (
select id, salary,
datevaliduntil + interval '1 day' datevaliduntil,
lag(datevaliduntil, 1, datevaliduntil)
over(partition by id order by datevaliduntil) + interval '1 day' lag_datevaliduntil
from mytable t
) t
cross join lateral generate_series(
t.lag_datevaliduntil,
least(t.datevaliduntil, '2021-03-01'),
'1 month'
) x(date)
You control the overall upper bound with the literal date in the second argument to generate_series
(here, you want to stop end of March 2021).
date | id | salary :------------------ | ---: | -----: 2020-04-30 00:00:00 | 1001 | 3000 2020-04-30 00:00:00 | 1001 | 4000 2020-05-31 00:00:00 | 1001 | 4000 2020-06-30 00:00:00 | 1001 | 4000 2020-07-31 00:00:00 | 1001 | 4000 2020-08-31 00:00:00 | 1001 | 4000 2020-08-31 00:00:00 | 1001 | 5000 2020-09-30 00:00:00 | 1001 | 5000 2020-10-31 00:00:00 | 1001 | 5000 2020-11-30 00:00:00 | 1001 | 5000 2020-12-31 00:00:00 | 1001 | 5000 2021-01-31 00:00:00 | 1001 | 5000 2021-02-28 00:00:00 | 1001 | 5000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.