In a PostgreSQL database, I have a table of measurements that looks as follows:
| sensor_group_id | ts | value_1 | value_2 | etc... |
|-----------------|---------------------------|---------|---------|--------|
| 1 | 2021-07-21T00:20:00+00:00 | 15 | NULL | |
| 1 | 2021-07-15T00:20:00+00:00 | NULL | 23 | |
| 2 | 2021-07-17T00:20:00+00:00 | NULL | 11 | |
| 1 | 2021-07-13T00:20:00+00:00 | 9 | 4 | |
| 2 | 2021-07-10T00:20:00+00:00 | 99 | 36 | |
There are many columns with different types of measurements in this table. Each Sensor Group produces measurements of different types at the same time, but not always all types. So we end up with partly filled rows.
What I want to do:
The solution I have now, seems pretty cumbersome:
WITH
latest_value_1 AS (SELECT DISTINCT ON (sensor_group_id) sensor_group_id, ts, value_1
FROM measurements
WHERE value_1 IS NOT NULL
ORDER BY sensor_group_id, ts DESC),
latest_value_2 AS (SELECT DISTINCT ON (sensor_group_id) sensor_group_id, ts, value_2
FROM measurements
WHERE value_2 IS NOT NULL
ORDER BY sensor_group_id, ts DESC),
latest_value_3 AS (SELECT DISTINCT ON (sensor_group_id) sensor_group_id, ts, value_3
FROM measurements
WHERE value_3 IS NOT NULL
ORDER BY sensor_group_id, ts DESC),
etc...
SELECT latest_value_1.sensor_group_id,
latest_value_1.ts AS latest_value_1_ts,
value_1,
latest_value_2.ts AS latest_value_2_ts,
value_2,
latest_value_3.ts AS latest_value_3_ts,
value_3,
etc...
FROM lastest_value_1
JOIN latest_value_2
ON latest_value_1.sensor_group_id = latest_value_2.sensor_group_id
JOIN latest_value_2
ON latest_value_1.sensor_group_id = latest_value_2.sensor_group_id
JOIN latest_value_3
ON latest_value_1.sensor_group_id = latest_value_3.sensor_group_id
etc...
This produces the following result:
sensor_group_id | latest_value_1_ts | value_1 | latest_value_2_ts | value_2 | etc... |
---|---|---|---|---|---|
1 | 2021-07-21T00:20:00+00:00 | 15 | 2021-07-21T00:20:00+00:00 | 23 | |
2 | 2021-07-10T00:20:00+00:00 | 99 | 2021-07-17T00:20:00+00:00 | 11 |
This seems outrageously complicated, but I'm not sure if there is a better approach. Help would be much appreciated!
Not sure is it simpler...
with
sensor_groups(sgr_id) as ( -- Change it to the list of groups if you have it
select distinct sensor_group_id from measurements)
select
*
from
sensor_groups as sg
left join lateral (
select ts, value_1
from measurements
where value_1 is not null and sensor_group_id = sg.sgr_id
order by ts desc limit 1) as v1(ts_1, v_1) on true
left join lateral (
select ts, value_2
from measurements
where value_2 is not null and sensor_group_id = sg.sgr_id
order by ts desc limit 1) as v2(ts_2, v_2) on true
...
PS: Data normalization could help a lot
What you really want is the IGNORE NULLS
option on LAG()
or LAST_VALUE()
. But Postgres does not support this functionality. Instead, you can use a two-level trick, where you assign a grouping for each value, so each NULL
value is in the same group as the previous row with a value. Then "schmear" the values through the group:
select t.*,
max(value_1) over (partition by sensor_group_id, grp_1) as imputed_value_1,
max(value_2) over (partition by sensor_group_id, grp_2) as imputed_value_2,
max(value_3) over (partition by sensor_group_id, grp_3) as imputed_value_3
from (select t.*,
count(value_1) over (partition by sensor_group_id order by ts) as grp_1,
count(value_2) over (partition by sensor_group_id order by ts) as grp_2,
count(value_3) over (partition by sensor_group_id order by ts) as grp_3
from t
) t;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.