I have this data. I want to find which activity occurred consecutively for how many days:
Id datetime date Hour Activity
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing
I tried grouping the all consecutive dates:
sample['Consecutive'] = sample.groupby('Id').date.diff().dt.days.ne(1).cumsum()
This giving me an output as:
Id datetime date Hour Activity Consecutive
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login 1
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login 2
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login 3
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login 4
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login 5
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login 6
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login 6
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing 6
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing 7
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing 8
Desired output:
Id datetime date Hour Activity Consecutive
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login 1
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login 1
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login 1
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login 2
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login 3
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login 3
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login 3
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing 3
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing 3
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing 3
Please help me in correcting this.
If I understood correctly what you're trying to achieve, you just need to change ne(1)
to gt(1)
:
df['Consecutive'] = df.groupby('Id')['date'].diff().dt.days.gt(1).cumsum() + 1
df
Output:
Id datetime date Hour Activity Consecutive
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login 1
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login 1
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login 1
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login 2
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login 3
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login 3
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login 3
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing 3
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing 3
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.