简体   繁体   中英

Selecting latest consecutive records that match a condition with PostgreSQL

I am looking for a PostgreSQL query to find the latest consecutive records that match a condition. Let me explain it better with an example:

|  ID  |  HEATING STATE  |  DATE       |
| ---- | --------------- |  ---------- |
|  1   |  ON             |  2018-02-19 |
|  2   |  ON             |  2018-02-20 |
|  3   |  OFF            |  2018-02-20 |
|  4   |  OFF            |  2018-02-21 |
|  5   |  ON             |  2018-02-21 |
|  6   |  OFF            |  2018-02-21 |
|  7   |  ON             |  2018-02-22 |
|  8   |  ON             |  2018-02-22 |
|  9   |  ON             |  2018-02-22 |
| 10   |  ON             |  2018-02-23 |

I need to find all the recent consecutive records with date >= 2018-02-20 and heating_state ON, ie the ones with ID 7, 8, 9, 10. My main issue is with the fact that they must be consecutive .

For further clarification, if needed:

  • ID 1 is excluded because older than 2018-02-20
  • ID 2 is excluded because followed by ID 3 which has heating state OFF
  • ID 3 is excluded because it has heating state OFF
  • ID 4 is excluded because it is followed by ID 5, which has heating OFF
  • ID 5 is excluded because it has heating state OFF
  • ID 6 is excluded because it has heating state OFF

Use the LEAD function with a CASE expression.

SQL Fiddle

Query 1 :

SELECT id, 
       heating_state, 
       dt 
FROM   (SELECT t.*, 
               CASE 
                 WHEN dt >= timestamp '2018-02-20' 
                      AND heating_state = 'ON' 
                      AND LEAD(heating_state, 1, heating_state) 
                            OVER ( 
                                  ORDER BY dt ) = 'ON' THEN 1 
                 ELSE 0 
               END on_state 
        FROM   t) s 
WHERE  on_state = 1

Results :

| id | heating_state |                   dt |
|----|---------------|----------------------|
|  7 |            ON | 2018-02-22T00:00:00Z |
|  8 |            ON | 2018-02-22T00:00:00Z |
|  9 |            ON | 2018-02-22T00:00:00Z |
| 10 |            ON | 2018-02-23T00:00:00Z |

I think this is best solved using windows functions and a filtered aggregate.

For each row, add the number of later rows that have state = 'OFF' , then use only the rows where that count is 0.

You need a subquery because you cannot use a window function result in the WHERE condition ( WHERE is evaluated before window functions).

SELECT id, state, date
FROM (SELECT id, state, date,
             count(*) FILTER (WHERE state = 'OFF')
                OVER (ORDER BY date DESC, state DESC) AS later_off_count
      FROM tab) q
WHERE later_off_count = 0;

 id | state |    date    
----+-------+------------
 10 | ON    | 2018-02-23
  9 | ON    | 2018-02-22
  8 | ON    | 2018-02-22
  7 | ON    | 2018-02-22
(4 rows)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM