简体   繁体   中英

Group by 7 day interval postgresql

I know this is a common question, but I couldn't find something that matches my case. I have this data:

  id |    obs     
----+------------
  1 | 2018-01-01
  2 | 2018-01-02
  3 | 2018-01-03
  4 | 2018-01-04
  5 | 2018-01-05
  6 | 2018-01-06
  7 | 2018-01-07
  8 | 2018-01-15
  9 | 2018-01-20
 10 | 2018-02-03
 11 | 2018-02-04
 12 | 2018-02-05
 13 | 2018-02-06
 14 | 2018-02-06

I want this data to be grouped based on a 7 day interval. That is, the groups would be:

  • Group 1: id 1 to 7
  • Group 2: id 8 and 9
  • Group 3: id 10 to 14

How is this query in PostgreSQL?

Thanks in advance

I would proceed as follow:

  • first, use a subquery to compare the date of the current record to the minimum date of the series; the difference in days between the dates divided by 7 gives you a first version of the group the record belong to (but for now group numbers are not necessarily consecutive)
  • then, use DENSE_RANK() in an outer query to reassign group numbers as consecutive numbers:

Query:

SELECT 
    id,
    obs,
    DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
    SELECT 
        id,
        obs,
        MIN(obs) OVER(),
        (obs - MIN(obs) OVER())::int/7 + 1 gr
    FROM mytable
) x
ODER BY id

Demo on DB Fiddle ;

| id  | obs                      | grp |
| --- | ------------------------ | --- |
| 1   | 2018-01-01T00:00:00.000Z | 1   |
| 2   | 2018-01-02T00:00:00.000Z | 1   |
| 3   | 2018-01-03T00:00:00.000Z | 1   |
| 4   | 2018-01-04T00:00:00.000Z | 1   |
| 5   | 2018-01-05T00:00:00.000Z | 1   |
| 6   | 2018-01-06T00:00:00.000Z | 1   |
| 7   | 2018-01-07T00:00:00.000Z | 1   |
| 8   | 2018-01-15T00:00:00.000Z | 2   |
| 9   | 2018-01-20T00:00:00.000Z | 2   |
| 10  | 2018-02-03T00:00:00.000Z | 3   |
| 11  | 2018-02-04T00:00:00.000Z | 3   |
| 12  | 2018-02-05T00:00:00.000Z | 4   |
| 13  | 2018-02-06T00:00:00.000Z | 4   |
| 14  | 2018-02-06T00:00:00.000Z | 4   |

If you want to group things based on a gap of seven days, use lag() and a cumulative sum to define the groups:

select t.*,
       count(*) filter (where prev_obs is null or prev_obs < obs - interval '7 day') over (order by obs) as grp
from (select t.*,
             lag(obs) over (order by obs) as prev_obs
      from t
     ) t

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM