简体   繁体   中英

Select last row of each group in PostgreSQL table

I have a table that looks as follows:

TS Serial Number Activity Address
1 123456 AAAABBBBCCCC
2 123456 AAAABBBBCCCC
3 123456 A AAAABBBBCCCC
4 123456 E AAAABBBBCCCC
5 876543 A UNIUNIUNIUNI
6 123456 A AAAABBBBCCCC
7 123456 E WAHWAHWAHWAH
8 123456 WAHWAHWAHWAH
9 876543 E ALFALFALFALF
10 876543 ALFALFALFALF

TS is a timestamp column that usually contains an ISO date string. I've shortened this for simplicity.

As you can see, a change in the Address field CAN occur whenever there's an Activity = E .

Some further background about the data:

The ungrouped rows can be in semi-arbitrary order, though each Activity A within a group, when sorted by timestamp ( TS ), MUST always be followed by an Activity E , however not necessarily immediately. There CAN be <null> Activities in between the A and E . If there is no E following the last A within a group, sorted by TS , the corresponding Serial Number can safely be considered invalid.

What I need

For each Serial Number , sorted by TS in ascending order, I need the Address of the last occurrence of Activity = E , if and only if that last E is NOT followed by another A , otherwise Address may contain INVALID or alternatively the corresponding Serial Number can be omitted from the result.

step-by-step demo:db<>fiddle

SELECT DISTINCT ON (ser_no)         -- 4
    *
FROM (
    SELECT
        *,
        MAX(ts) FILTER (WHERE activity = 'A') OVER (PARTITION BY ser_no) as last_a,    -- 1
        MAX(ts) FILTER (WHERE activity = 'E') OVER (PARTITION BY ser_no) as last_e
    FROM
        mytable
) s
WHERE last_a < last_e               -- 2
    AND activity = 'E'              -- 3
ORDER BY ser_no, ts DESC            -- 4
  1. Find timestamp of last A and last E using the MAX() window function
  2. Choose only those ser_no partitions where last A was before last E
  3. Remove all non- E records
  4. Order remaining E records by timestamp DESC , to get the most recent the top-most record per group and remove all others using the DISTINCT ON claus

You need any "E" row not followed by any "A" or "E" with the same serial number.

This translates in SQL as:

SELECT Serial_Number, Address
FROM Tbl ret
WHERE Activity = 'E'
  AND NOT EXISTS (
    SELECT *
    FROM Tbl witness
    WHERE witness.Serial_Number = ret.Serial_Number
      AND witness.TS > ret.TS
      AND witness.Activity IN ('A', 'E')
  );

Hmmm. . . You can use distinct on if you want to include the invalid records:

select ser_no, ts,
       (case when activity = 'E' then address
             else 'INVALID'
        end)
from t
where activity in ('E', 'A')
order by (ser_no, ts desc);

This just gets the last E/A row for each ser_no and assigns the address accordingly.

If you want to remove them, then you can still manage without a subquery. It would be nice if Postgres had a "first"/"last" aggregation function, but you can mimic it with arrays:

select ser_no, max(ts),
       (array_agg(address order by ts desc))[1] as last_address
from t
where activity in ('E', 'A')
group by ser_no
having max(ts) filter (where activity = 'E') > max(ts) filter (where activity = 'A');

With a subquery, I would suggest:

select t.*
from t
where t.activity = 'E' and
      t.ts = (select max(t2.ts)
              from t t2
              where t2.ser_no = t.ser_no and
                    t2.activity in ('A', 'E')
             );

This fetches the last "E" row when it is the last row for either E or A.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM