Aggregating rows in SQL with missing Booleans

Question

I have the below SQL script which returns the following data from a PostgreSQL DB view table.

SELECT 
  "V_data".macaddr,
  "V_data".sensorid,
  "V_data".ts,
  "V_data".velocity,
  "V_data".temp,
  "V_data".highspeed,
  "V_data".hightemp,
  "V_data".distance,

FROM 
  sensordb."V_data"

WHERE 
  "V_data".macaddr like '%abcdef'

AND
  (
  ("V_data".sensorid = 'abc1') or ("V_data".sensorid = 'a2bc') or ("V_data".sensorid = 'ab3c') 
  )

AND
  "V_data".ts >= 1616370867000

ORDER BY
  "V_data".ts DESC;

Output

macaddr	sensorid	ts	velocity	temp	highspeed	hightemp	distance
abcdef	abc1	1616370867010	25	32			52
abcdef	a2bc	1616370867008	27	35		T	51
abcdef	ab3c	1616370867006	26	30			50
abcdef	abc1	1616370867005	24	36		T	50
abcdef	a2bc	1616370867004	27	31			50
abcdef	abc1	1616370867002	21	30	T		48
abcdef	ab3c	1616370867000	22	33	F		46

I want to aggregate the rows such that I have the latest readings per sensorid for ts, velocity, temp, distance. For the Booleans highspeed and hightemp, I want the latest available Boolean value or an empty cell if no Boolean value was available.

Expected output

macaddr	sensorid	ts	velocity	temp	highspeed	hightemp	distance
abcdef	abc1	1616370867010	25	32	T	T	52
abcdef	a2bc	1616370867008	27	35		T	51
abcdef	ab3c	1616370867006	26	30	F		50

How could I simplify this task?

Thanks.

Answer 1

Hmmm. . . For all but the boolean columns DISTINCT ON would work. But those booleans are tricky. You could use some tricks on booleans.

Instead, let's go for ROW_NUMBER() to get the most recent row. And fiddle with arrays to get the most recent boolean values:

SELECT d.macaddr, d.sensorid,
       MAX(d.ts) as ts,
       MAX(d.velocity) FILTER (WHERE seqnum = 1) as velocity,
       MAX(d.temp) FILTER (WHERE seqnum = 1) as temp,
       (ARRAY_REMOVE(ARRAY_AGG(d.highspeed ORDER BY ts DESC), NULL))[1] as highspeed,
       (ARRAY_REMOVE(ARRAY_AGG(d.hightemp ORDER BY ts DESC), NULL))[1] as hightemp
       MAX(d.distance) FILTER (WHERE seqnum = 1)
FROM (SELECT d.*,
             ROW_NUMBER() OVER (PARTITION BY d.macaddr, d.sensorid ORDER BY ts DESC) as seqnum
      FROM sensordb."V_data" d
      WHERE d.macaddr like '%abcdef' AND
            d.sensorid IN ('abc1', 'a2bc', 'ab3c') AND
            d.ts >= 1616370867000
     ) d
GROUP BY d.macaddr, d.sensorid
ORDER BY d.ts DESC;

Answer 2

You can use DISTINCT ON (available only in PostgreSQL afaik) to simplify this query. You can do:

with
q as (
  -- your query here
)
select 
  l.macaddr, l.sensorid, l.ts, l.velocity, l.temp,
  s.highspeed, t.hightemp, 
  l.distance  
from (
  select distinct on (sensorid) *
  from q
  order by sensorid, ts desc
) l
left join (
  select distinct on (sensorid) *
  from q
  where highspeed is not null
  order by sensorid, ts desc
) s on s.sensorid = l.sensorid
left join (
  select distinct on (sensorid) *
  from q
  where hightemp is not null
  order by sensorid, ts desc
) t on t.sensorid = l.sensorid

Aggregating rows in SQL with missing Booleans

Question

2 answers

solution1
2 2021-03-22 01:15:22

solution2
2 ACCPTED 2021-03-22 01:15:36

Aggregating rows in SQL with missing Booleans

Question

2 answers

solution1 2 2021-03-22 01:15:22

solution2 2 ACCPTED 2021-03-22 01:15:36

solution1
2 2021-03-22 01:15:22

solution2
2 ACCPTED 2021-03-22 01:15:36