简体   繁体   中英

Filtering subset time periods in postgresql

How would one query a table like table_1 (example shown below) such that only the maximum time period with no other rows being a subset of the time period. For example Table_1 should give whats in table_2. I have tried to use various where conditions using Exists but I can not seem to extract the desired rows.

I have tried queries like the following:

    SELECT * FROM table_1 x WHERE NOT EXISTS (SELECT * FROM table_2 y WHERE (y.start < x.start and y.end > x.end) or (y.start < x.start and y.end <= x.end) or (y.start >= x.start and y.end < x.end) and y.symbol = x.symbol);

Any help would be appreciated. Using postgresql.

Edit: I have reduced the size of table_1 to reduce the length. By maximum time period I mean all disjoint periods such that there is no period in between them in other rows. ie that the time period is maximal – that is for row R to be maximal, R.start<= R'.start<=R'.end<=R.end where R' is any other row in the table with the same symbol value.

So the reason why two rows are return for AJR is because there is no rows that start or end between 2021-01-23 and 2021-05-08. But for example the first 4 rows of table_1 are not returned as they fall within the time period of row 5.

Table_2:

symbol start end
AJR 2021-01-02 2021-01-23
AJR 2021-05-08 2021-06-05
BBB 2021-07-17 2021-07-24
CCC 2021-10-23 2021-11-20

Table_1:

symbol start end
AJR 2021-01-02 2021-01-09
AJR 2021-01-02 2021-01-16
AJR 2021-01-02 2021-01-23
AJR 2021-01-09 2021-01-16
AJR 2021-01-09 2021-01-23
AJR 2021-01-16 2021-01-23
AJR 2021-05-08 2021-05-15
AJR 2021-05-08 2021-05-22
AJR 2021-05-08 2021-05-29
AJR 2021-05-08 2021-06-05
AJR 2021-05-15 2021-05-22
AJR 2021-05-15 2021-05-29
AJR 2021-05-15 2021-06-05
AJR 2021-05-22 2021-05-29
AJR 2021-05-22 2021-06-05
AJR 2021-05-29 2021-06-05
BBB 2021-07-17 2021-07-24
CCC 2021-10-23 2021-10-30
CCC 2021-10-23 2021-11-13
CCC 2021-10-23 2021-11-20
CCC 2021-10-30 2021-11-13
CCC 2021-10-30 2021-11-20
CCC 2021-11-13 2021-11-20

You could apply a self join that combines all starts to all ends that match on the symbol field, then remove from the " start " values any corresponding " end " values. Eventually you can aggregate on the " end " field by grouping on the " symbol " and " start " field.

SELECT t1.symbol,
       t1.start,
       MAX(COALESCE(t2.end, t1.end))    AS "end"
FROM       tab t1
LEFT JOIN tab t2
       ON t1.end = t2.start
WHERE CONCAT(t1.symbol, t1.start) NOT IN (
    SELECT CONCAT(symbol, "end") 
    FROM tab
) 
GROUP BY t1.symbol, 
         t1.start

Check the demo here .

Note : In the case that there is only one row (hence the self join will return a NULL value in the last " end " field), you can fix this using a COALESCE .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM