简体   繁体   中英

Merge continuous rows with Postgresql

I have a slots table like this :

   Column   |            Type             |
------------+-----------------------------+
 id         | integer                     |
 begin_at   | timestamp without time zone |
 end_at     | timestamp without time zone |
 user_id    | integer                     |

and I like to select merged rows for continuous time. Let's say I have (simplified) data like :

(1, 5:15, 5:30, 1)
(2, 5:15, 5:30, 2)
(3, 5:30, 5:45, 2)
(4, 5:45, 6:00, 2)
(5, 8:15, 8:30, 2)
(6, 8:30, 8:45, 2)

I would like to know if it's possible to select rows formatted like :

(5:15, 5:30, 1)
(5:15, 6:00, 2) // <======= rows id 2,3 and 4 merged
(8:15, 8:45, 2) // <======= rows id 5 and 6 merged

EDIT: Here's the SQLfiddle

I'm using Postgresql, version 9.3!

Thank you!

Here is one method for solving this problem. Create a flag that determines if a one record does not overlap with the previous one. This is the start of a group. Then take the cumulative sum of this flag and use that for grouping:

select user_id, min(begin_at) as begin_at, max(end_at) as end_at
from (select s.*, sum(startflag) over (partition by user_id order by begin_at) as grp
      from (select s.*,
                   (case when lag(end_at) over (partition by user_id order by begin_at) >= begin_at
                         then 0 else 1
                    end) as startflag
            from slots s
           ) s
     ) s
group by user_id, grp;

Here is a SQL Fiddle.

Gordon Linoff already provided the answer (I upvoted).

I've used the same approach, but wanted to deal with tsrange type . So I came up with this construct :

SELECT min(id) b_id, min(begin_at) b_at, max(end_at) e_at, grp, user_id
  FROM (
    SELECT t.*, sum(g) OVER (ORDER BY id) grp
      FROM (
        SELECT s.*, (NOT r -|- lag(r,1,r)
                     OVER (PARTITION BY user_id ORDER BY id))::int g
          FROM (SELECT id,begin_at,end_at,user_id,
                       tsrange(begin_at,end_at,'[)') r FROM slots) s
      ) t
  ) u
 GROUP BY grp, user_id
 ORDER BY grp;

Unfortunately, on the top level one has to use min(begin_at) and max(end_at) , as there're no aggregate functions for the range-based union operator + .

I create ranges with exclusive upper bounds, this allows me to use “is adjacent to” ( -|- ) operator . I compare current tsrange with the one on the previous row, defaulting to the current one in case there's no previous. Then I negate the comparison and cast to integer , which gives me 1 in cases when new group starts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM