Select last row of each group in PostgreSQL table

Question

I have a table that looks as follows:

TS	Serial Number	Activity	Address
1	123456		AAAABBBBCCCC
2	123456		AAAABBBBCCCC
3	123456	A	AAAABBBBCCCC
4	123456	E	AAAABBBBCCCC
5	876543	A	UNIUNIUNIUNI
6	123456	A	AAAABBBBCCCC
7	123456	E	WAHWAHWAHWAH
8	123456		WAHWAHWAHWAH
9	876543	E	ALFALFALFALF
10	876543		ALFALFALFALF

TS is a timestamp column that usually contains an ISO date string. I've shortened this for simplicity.

As you can see, a change in the Address field CAN occur whenever there's an Activity = E .

Some further background about the data:

The ungrouped rows can be in semi-arbitrary order, though each Activity A within a group, when sorted by timestamp ( TS ), MUST always be followed by an Activity E , however not necessarily immediately. There CAN be <null> Activities in between the A and E . If there is no E following the last A within a group, sorted by TS , the corresponding Serial Number can safely be considered invalid.

What I need

For each Serial Number , sorted by TS in ascending order, I need the Address of the last occurrence of Activity = E , if and only if that last E is NOT followed by another A , otherwise Address may contain INVALID or alternatively the corresponding Serial Number can be omitted from the result.

Answer 1

step-by-step demo:db<>fiddle

SELECT DISTINCT ON (ser_no)         -- 4
    *
FROM (
    SELECT
        *,
        MAX(ts) FILTER (WHERE activity = 'A') OVER (PARTITION BY ser_no) as last_a,    -- 1
        MAX(ts) FILTER (WHERE activity = 'E') OVER (PARTITION BY ser_no) as last_e
    FROM
        mytable
) s
WHERE last_a < last_e               -- 2
    AND activity = 'E'              -- 3
ORDER BY ser_no, ts DESC            -- 4

Find timestamp of last A and last E using the MAX() window function
Choose only those ser_no partitions where last A was before last E
Remove all non- E records
Order remaining E records by timestamp DESC , to get the most recent the top-most record per group and remove all others using the DISTINCT ON claus

Answer 2

You need any "E" row not followed by any "A" or "E" with the same serial number.

This translates in SQL as:

SELECT Serial_Number, Address
FROM Tbl ret
WHERE Activity = 'E'
  AND NOT EXISTS (
    SELECT *
    FROM Tbl witness
    WHERE witness.Serial_Number = ret.Serial_Number
      AND witness.TS > ret.TS
      AND witness.Activity IN ('A', 'E')
  );

Answer 3

Hmmm. . . You can use distinct on if you want to include the invalid records:

select ser_no, ts,
       (case when activity = 'E' then address
             else 'INVALID'
        end)
from t
where activity in ('E', 'A')
order by (ser_no, ts desc);

This just gets the last E/A row for each ser_no and assigns the address accordingly.

If you want to remove them, then you can still manage without a subquery. It would be nice if Postgres had a "first"/"last" aggregation function, but you can mimic it with arrays:

select ser_no, max(ts),
       (array_agg(address order by ts desc))[1] as last_address
from t
where activity in ('E', 'A')
group by ser_no
having max(ts) filter (where activity = 'E') > max(ts) filter (where activity = 'A');

With a subquery, I would suggest:

select t.*
from t
where t.activity = 'E' and
      t.ts = (select max(t2.ts)
              from t t2
              where t2.ser_no = t.ser_no and
                    t2.activity in ('A', 'E')
             );

This fetches the last "E" row when it is the last row for either E or A.

Select last row of each group in PostgreSQL table

Question

3 answers

solution1
1 2020-12-08 07:54:56

solution2
1 2020-12-08 08:27:17

solution3
0 2020-12-08 13:16:20

Select last row of each group in PostgreSQL table

Question

3 answers

solution1 1 2020-12-08 07:54:56

solution2 1 2020-12-08 08:27:17

solution3 0 2020-12-08 13:16:20

solution1
1 2020-12-08 07:54:56

solution2
1 2020-12-08 08:27:17

solution3
0 2020-12-08 13:16:20