I've got a dataset like below:
ID ReportingDate Status
123 05/05/2020 GREEN
123 12/05/2020 NONE
123 19/05/2020 NONE
123 26/05/2020 AMBER
123 02/06/2020 RED
123 09/06/2020 NONE
123 16/06/2020 GREEN
123 23/06/2020 NONE
123 30/06/2020 AMBER
I want to ignore the NONE statuses and take the previous value, which may be previous row, but sometimes 2 or 3 rows before. Basically the final output should like the column "FINAL"
ID ReportingDate Status FINAL
123 05/05/2020 GREEN GREEN
123 12/05/2020 NONE GREEN
123 19/05/2020 NONE GREEN
123 26/05/2020 AMBER AMBER
123 02/06/2020 RED RED
123 09/06/2020 NONE RED
123 16/06/2020 GREEN GREEN
123 23/06/2020 NONE GREEN
123 30/06/2020 AMBER AMBER
I've tried to use LAG() or LEAD() functions, but it doesn't work as requested.
UPPER
(
CASE
WHEN psh.Status = 'NONE'
THEN LAG(psh.Status,1,psh.Status) OVER
(
PARTITION BY psh.ID
ORDER BY rp.ReportingDate
)
ELSE psh.Status
END
) AS OverallStatusHistory
Could you advise me, if there is a way how to achieve it, please?
Many thanks!
You can assign groups based on the count of non- 'NONE'
values. Then spread the value:
select t.*,
max(nullif(status, 'NONE')) over (partition by id, grp) as imputed_status
from (select t.*,
sum(case when status = 'NONE' then 0 else 1 end) over (partition by id order by reportingdate) as grp
from t
) t;
Here is a db<>fiddle.
The SQL standard actually supports the IGNORE NULLS
option on LAG()
(and various other window functions). This would make it possible to solve this without a subquery. Unfortunately, SQL Server does not (yet???) support that functionality.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.