简体   繁体   中英

How to group by consecutive dates in python?

I have this data. I want to find which activity occurred consecutively for how many days:

    Id          datetime             date       Hour            Activity
0   Abc         2021-04-26 14:30:33  2021-04-26 (12.0, 14.0]    login
1   Abc         2021-04-26 12:55:27  2021-04-26 (12.0, 14.0]    login
2   Abc         2021-04-26 13:30:31  2021-04-26 (12.0, 14.0]    login
3   Abc         2021-04-28 11:55:33  2021-04-28 (10.0, 12.0]    login
4   Abc         2021-05-01 08:25:15  2021-05-01 (8.0, 10.0]     login
5   Abc         2021-05-01 09:45:01  2021-05-01 (8.0, 10.0]     login
6   Abc         2021-05-02 11:05:19  2021-05-02 (10.0, 12.0]    login
7   Abc         2021-05-03 02:26:12  2021-05-03 (2.0, 4.0]      browsing
8   Abc         2021-05-03 03:59:10  2021-05-03 (2.0, 4.0]      browsing
9   Abc         2021-05-03 05:40:00  2021-05-03 (4.0, 6.0]      browsing

I tried grouping the all consecutive dates:

sample['Consecutive'] = sample.groupby('Id').date.diff().dt.days.ne(1).cumsum()

This giving me an output as:

    Id          datetime             date       Hour            Activity   Consecutive
0   Abc         2021-04-26 14:30:33  2021-04-26 (12.0, 14.0]    login      1
1   Abc         2021-04-26 12:55:27  2021-04-26 (12.0, 14.0]    login      2
2   Abc         2021-04-26 13:30:31  2021-04-26 (12.0, 14.0]    login      3
3   Abc         2021-04-28 11:55:33  2021-04-28 (10.0, 12.0]    login      4
4   Abc         2021-05-01 08:25:15  2021-05-01 (8.0, 10.0]     login      5
5   Abc         2021-05-01 09:45:01  2021-05-01 (8.0, 10.0]     login      6
6   Abc         2021-05-02 11:05:19  2021-05-02 (10.0, 12.0]    login      6
7   Abc         2021-05-03 02:26:12  2021-05-03 (2.0, 4.0]      browsing   6 
8   Abc         2021-05-03 03:59:10  2021-05-03 (2.0, 4.0]      browsing   7
9   Abc         2021-05-03 05:40:00  2021-05-03 (4.0, 6.0]      browsing   8

Desired output:

    Id          datetime             date       Hour            Activity   Consecutive
0   Abc         2021-04-26 14:30:33  2021-04-26 (12.0, 14.0]    login      1
1   Abc         2021-04-26 12:55:27  2021-04-26 (12.0, 14.0]    login      1
2   Abc         2021-04-26 13:30:31  2021-04-26 (12.0, 14.0]    login      1
3   Abc         2021-04-28 11:55:33  2021-04-28 (10.0, 12.0]    login      2
4   Abc         2021-05-01 08:25:15  2021-05-01 (8.0, 10.0]     login      3
5   Abc         2021-05-01 09:45:01  2021-05-01 (8.0, 10.0]     login      3
6   Abc         2021-05-02 11:05:19  2021-05-02 (10.0, 12.0]    login      3
7   Abc         2021-05-03 02:26:12  2021-05-03 (2.0, 4.0]      browsing   3 
8   Abc         2021-05-03 03:59:10  2021-05-03 (2.0, 4.0]      browsing   3
9   Abc         2021-05-03 05:40:00  2021-05-03 (4.0, 6.0]      browsing   3

Please help me in correcting this.

If I understood correctly what you're trying to achieve, you just need to change ne(1) to gt(1) :

df['Consecutive'] = df.groupby('Id')['date'].diff().dt.days.gt(1).cumsum() + 1
df

Output:


    Id             datetime       date          Hour  Activity  Consecutive
0  Abc  2021-04-26 14:30:33 2021-04-26  (12.0, 14.0]     login            1
1  Abc  2021-04-26 12:55:27 2021-04-26  (12.0, 14.0]     login            1
2  Abc  2021-04-26 13:30:31 2021-04-26  (12.0, 14.0]     login            1
3  Abc  2021-04-28 11:55:33 2021-04-28  (10.0, 12.0]     login            2
4  Abc  2021-05-01 08:25:15 2021-05-01   (8.0, 10.0]     login            3
5  Abc  2021-05-01 09:45:01 2021-05-01   (8.0, 10.0]     login            3
6  Abc  2021-05-02 11:05:19 2021-05-02  (10.0, 12.0]     login            3
7  Abc  2021-05-03 02:26:12 2021-05-03    (2.0, 4.0]  browsing            3
8  Abc  2021-05-03 03:59:10 2021-05-03    (2.0, 4.0]  browsing            3
9  Abc  2021-05-03 05:40:00 2021-05-03    (4.0, 6.0]  browsing            3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM