简体   繁体   中英

Aggregating rows in SQL with missing Booleans

I have the below SQL script which returns the following data from a PostgreSQL DB view table.

SELECT 
  "V_data".macaddr,
  "V_data".sensorid,
  "V_data".ts,
  "V_data".velocity,
  "V_data".temp,
  "V_data".highspeed,
  "V_data".hightemp,
  "V_data".distance,

FROM 
  sensordb."V_data"

WHERE 
  "V_data".macaddr like '%abcdef'

AND
  (
  ("V_data".sensorid = 'abc1') or ("V_data".sensorid = 'a2bc') or ("V_data".sensorid = 'ab3c') 
  )

AND
  "V_data".ts >= 1616370867000

ORDER BY
  "V_data".ts DESC;

Output

macaddr sensorid ts velocity temp highspeed hightemp distance
abcdef abc1 1616370867010 25 32 52
abcdef a2bc 1616370867008 27 35 T 51
abcdef ab3c 1616370867006 26 30 50
abcdef abc1 1616370867005 24 36 T 50
abcdef a2bc 1616370867004 27 31 50
abcdef abc1 1616370867002 21 30 T 48
abcdef ab3c 1616370867000 22 33 F 46

I want to aggregate the rows such that I have the latest readings per sensorid for ts, velocity, temp, distance. For the Booleans highspeed and hightemp, I want the latest available Boolean value or an empty cell if no Boolean value was available.

Expected output

macaddr sensorid ts velocity temp highspeed hightemp distance
abcdef abc1 1616370867010 25 32 T T 52
abcdef a2bc 1616370867008 27 35 T 51
abcdef ab3c 1616370867006 26 30 F 50

How could I simplify this task?

Thanks.

Hmmm. . . For all but the boolean columns DISTINCT ON would work. But those booleans are tricky. You could use some tricks on booleans.

Instead, let's go for ROW_NUMBER() to get the most recent row. And fiddle with arrays to get the most recent boolean values:

SELECT d.macaddr, d.sensorid,
       MAX(d.ts) as ts,
       MAX(d.velocity) FILTER (WHERE seqnum = 1) as velocity,
       MAX(d.temp) FILTER (WHERE seqnum = 1) as temp,
       (ARRAY_REMOVE(ARRAY_AGG(d.highspeed ORDER BY ts DESC), NULL))[1] as highspeed,
       (ARRAY_REMOVE(ARRAY_AGG(d.hightemp ORDER BY ts DESC), NULL))[1] as hightemp
       MAX(d.distance) FILTER (WHERE seqnum = 1)
FROM (SELECT d.*,
             ROW_NUMBER() OVER (PARTITION BY d.macaddr, d.sensorid ORDER BY ts DESC) as seqnum
      FROM sensordb."V_data" d
      WHERE d.macaddr like '%abcdef' AND
            d.sensorid IN ('abc1', 'a2bc', 'ab3c') AND
            d.ts >= 1616370867000
     ) d
GROUP BY d.macaddr, d.sensorid
ORDER BY d.ts DESC;

You can use DISTINCT ON (available only in PostgreSQL afaik) to simplify this query. You can do:

with
q as (
  -- your query here
)
select 
  l.macaddr, l.sensorid, l.ts, l.velocity, l.temp,
  s.highspeed, t.hightemp, 
  l.distance  
from (
  select distinct on (sensorid) *
  from q
  order by sensorid, ts desc
) l
left join (
  select distinct on (sensorid) *
  from q
  where highspeed is not null
  order by sensorid, ts desc
) s on s.sensorid = l.sensorid
left join (
  select distinct on (sensorid) *
  from q
  where hightemp is not null
  order by sensorid, ts desc
) t on t.sensorid = l.sensorid

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM