简体   繁体   中英

Postgres Delete records from table keeping a minimum and maximum value

I have a Postgres table like so:

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-28 2:00  |156      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-20 8:00  |156      |AR   |
|34678 |2020-07-22 1:00  |200      |MS   |

and essentially I want to delete every row such that I only keep 2 rows per scan ID. I want to retain the scan IDs with the minimum time and with the maximum time.

The current workflow is that data gets aggregated and written to this table every day, so after it's written there may be a bunch of new scan events, but I only want to keep the max and min. How would I go about doing that?

EDIT: The desired result table would look like this

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-22 1:00  |200      |MS   |

You can use using :

delete from t
    using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
           from t
           group by scanid
          ) tt
    where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);

You could also phrase this as:

delete from t
    where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
          scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;
          

A record is in the middle if there is (at least) one above it, and (at least) one below it:


DELETE FROM ztable d
WHERE EXISTS ( SELECT *         
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime < d.scandatetime
        )
AND EXISTS ( SELECT *
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime > d.scandatetime
        );

Similar trick, using row_number() :


DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
        , row_number() OVER 
                (PARTITION BY scanId ORDER BY scandatetime ASC) rn
        , row_number() OVER
                (PARTITION BY scanId ORDER BY scandatetime DESC) rrn
        FROM  ztable 
        ) x
        WHERE x.scanId = d.scanId
        AND x.scandatetime = d.scandatetime
        AND x.rn <> 1 AND x.rrn <> 1

        ;

You can use a NOT IN with a sub-select:

delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
                                     from the_table
                                     group by scanid
                                     union all
                                     select scanid, max(scandatetime)
                                     from the_table
                                     group by scanid);

But I assume the solution using exists will be faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM