简体   繁体   中英

Redshift query to combine result if the data are continous within a table

I have a requirement in redshift where I need to combine result if the data are continuous. I have the following table, where user_id, product_id are varchar and login_time, log_out_time are timestamp.

user_id    product_id   login_time                log_out_time
----------------------------------------------------------------------
ashok      facebook     1/1/2017 1:00:00 AM       1/1/2017 2:00:00 AM
ashok      facebook     1/1/2017 2:00:00 AM       1/1/2017 3:00:00 AM
ashok      facebook     1/1/2017 3:00:00 AM       1/1/2017 4:00:00 AM
ashok      linked_in    1/1/2017 5:00:00 AM       1/1/2017 6:00:00 AM
ashok      linked_in    1/1/2017 6:00:00 AM       1/1/2017 7:00:00 AM
ashok      facebook     1/1/2017 8:00:00 AM       1/1/2017 9:00:00 AM
ram        facebook     1/1/2017 9:00:00 AM       1/1/2017 10:00:00 AM
ashok      linked_in    1/1/2017 7:00:00 AM       1/1/2017 8:00:00 AM

I need to combine the result if the data are continuous for a given user_id for each product. So my output should looks like,

user_id    product_id   login_time                log_out_time
----------------------------------------------------------------------
ashok      facebook     1/1/2017 1:00:00 AM       1/1/2017 4:00:00 AM
ashok      facebook     1/1/2017 8:00:00 AM       1/1/2017 9:00:00 AM
ashok      linked_in    1/1/2017 5:00:00 AM       1/1/2017 8:00:00 AM
ram        facebook     1/1/2017 9:00:00 AM       1/1/2017 10:00:00 AM

I tried with the following query but it doesn't helped me,

SELECT user_id, product_id, MIN(login_time), MAX(log_out_time) FROM TABLE_NAME GROUP BY user_id, product_id

Above query fails to give my required output since it doesn't have the logic to check the data are in continuous time. I need to have a query for this without using any custom function, but I am allowed to use any redshift in-built function.

You can use lag() to identify where groups start, then cumulative sum to identify the groups, then group by to aggregate the results:

select user_id, product_id, min(login_time), max(log_out_time)
from (select t.*,
             sum(case when prev_lt = login_time then 0 else 1 end) over
                 (partition by user_id, product_id
                  order by login_time
                  rows between unbounded preceding and current row
                 ) as grp
      from (select t.*,
                   lag(log_out_time) over (partition by user_id, product_id order by login_time) as prev_lt
            from t
           ) t
     ) t
group by user_id, product_id, grp;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM