简体   繁体   中英

How to group consecutive rows together in SQL by multiple columns

I have rows in a query that return something like:

Date        User    Time    Location    Service     Count
1/1/2018    Nick    12:00   Location A  X           1
1/1/2018    Nick    12:01   Location A  Y           1
1/1/2018    John    12:02   Location B  Z           1
1/1/2018    Harry   12:03   Location A  X           1
1/1/2018    Harry   12:04   Location A  X           1
1/1/2018    Harry   12:05   Location B  Y           1
1/1/2018    Harry   12:06   Location B  X           1
1/1/2018    Nick    12:07   Location A  X           1
1/1/2018    Nick    12:08   Location A  Y           1

where the query returns locations visited by a user and a count of picks done from the location. results are sorted by user and time ascending. I need to group it to where CONSECUTIVE rows with same User and Location are grouped with a SUM of Count column and comma separated list of unique values in Service Column, final result returns something like this:

Date        User    Start Time  End Time    Location    Service Count
1/1/2018    Nick    12:00       12:01       Location A  X,Y     2
1/1/2018    John    12:02       12:02       Location B  Z       1
1/1/2018    Harry   12:03       12:04       Location A  X       2
1/1/2018    Harry   12:05       12:06       Location B  X,Y     2
1/1/2018    Nick    12:07       12:08       Location A  X,Y     2

I'm not sure where to start. Maybe lag or partition clauses? hoping an SQL guru can help here...

This is a gaps and islands problem. One method for solving it uses row_number() :

select Date, User, min(Time) as start_time, max(time) as end_time,
       Location,
       listagg(Service, ',') within group (order by service),     
       count(*) as cnt
from (select t.*,
             row_number() over (date order by time) as seqnum,
             row_number() over (partition by user, date, location order by time) as seqnum_2
      from t
     ) t
group by Date, User, Location, (seqnum - seqnum_2);

It is a bit tricky to explain how this works. My suggestion is to run the subquery and you will see how the difference of row numbers defines the groups that you are looking for.

Use lag to get user and location values of previous row. Then use a running sum to generate a new group whenever the user and location change. Finally aggregate on the classified groups,user,location and date.

select Date, User, min(Time) as start_time,max(time) as end_time, Location,
listagg(Service, ',') within group (order by Service),
count(*) as cnt
from (select Date, User, Time, Location, 
      sum(case when prev_location=location and prev_user=user then 0 else 1 end) over(order by date,time) as grp
      from (select Date, User, Time, Location,
            lag(Location) over(order by date,time) as prev_location,
            lag(User) over(order by date,time) as prev_user,
            from t
           ) t
      ) t
group by Date, User, Location, grp;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM