简体   繁体   中英

SQL Redshift query to select first x dates within each group

Suppose my table looks like the following

user_id   login_date
1   2019-03-13 00:00:00.000000
1   2019-04-07 00:00:00.000000
1   2018-10-19 00:00:00.000000
1   2018-11-12 00:00:00.000000
1   2018-04-11 00:00:00.000000
6   2018-11-18 00:00:00.000000
6   2018-07-07 00:00:00.000000
6   2019-09-04 00:00:00.000000
6   2018-07-31 00:00:00.000000
6   2019-10-20 00:00:00.000000
12  2018-12-17 00:00:00.000000
12  2018-07-06 00:00:00.000000
12  2018-04-21 00:00:00.000000
12  2019-07-28 00:00:00.000000
48  2018-12-01 00:00:00.000000
48  2019-11-11 00:00:00.000000
48  2019-03-10 00:00:00.000000
48  2018-10-13 00:00:00.000000
48  2019-02-21 00:00:00.000000
48  2018-01-04 00:00:00.000000

I would like to select the logins within first 2 days after the first login. In other words, first have to find the minimum login date per group, and then select the ones that are within 48 hours, or sort the logins within each group and select the ones within first 2 days.

here is the SQL to create a similar table

CREATE TABLE TEST (user_id INT, login_date DATE NOT NULL)
INSERT INTO TEST ( user_id, login_date)
VALUES
(1,'20190901'),
(1,'20140719'),
(1,'20101118'),
(1,'20101119'),
(1,'20141118'),
(6,'20110818'),
(6,'20070119'),
(6,'20090419'),
(6,'20070118'),
(6,'20100219'),
(12,'20120718'),
(12,'20070618'),
(12,'20041218'),
(12,'20041219'),
(48,'20120118'),
(48,'20111119'),
(48,'20031019'),
(48,'20100318'),
(48,'20021119'),
(48,'20010418')

You could use window function first_value() in a subquery to retrieve the earliest login date per group, and then compare it to each login date in the outer query:

select 
    id, 
    login
from (
    select 
        t.*,
        first_value(login) over(
            partition by id 
            order by login
            rows between unbounded preceding and unbounded following
        ) first_login
    from mytable t
) t
where login < first_login + interval '2 days'

Another option is to use a correlated subquery for filtering:

select *
from mytable t
where login < (
    select min(login) + interval '2 days'
    from mytable t1
    where t1.id = t.id
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM