[英]How to group by consecutive dates in python?
我有這個數據。 我想找出連續發生了多少天的活動:
Id datetime date Hour Activity
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing
我嘗試將所有連續日期分組:
sample['Consecutive'] = sample.groupby('Id').date.diff().dt.days.ne(1).cumsum()
這給了我一個 output 為:
Id datetime date Hour Activity Consecutive
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login 1
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login 2
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login 3
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login 4
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login 5
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login 6
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login 6
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing 6
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing 7
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing 8
所需的 output:
Id datetime date Hour Activity Consecutive
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login 1
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login 1
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login 1
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login 2
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login 3
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login 3
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login 3
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing 3
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing 3
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing 3
請幫我糾正這個問題。
如果我正確理解了您要實現的目標,則只需將ne(1)
更改為gt(1)
:
df['Consecutive'] = df.groupby('Id')['date'].diff().dt.days.gt(1).cumsum() + 1
df
Output:
Id datetime date Hour Activity Consecutive
0 Abc 2021-04-26 14:30:33 2021-04-26 (12.0, 14.0] login 1
1 Abc 2021-04-26 12:55:27 2021-04-26 (12.0, 14.0] login 1
2 Abc 2021-04-26 13:30:31 2021-04-26 (12.0, 14.0] login 1
3 Abc 2021-04-28 11:55:33 2021-04-28 (10.0, 12.0] login 2
4 Abc 2021-05-01 08:25:15 2021-05-01 (8.0, 10.0] login 3
5 Abc 2021-05-01 09:45:01 2021-05-01 (8.0, 10.0] login 3
6 Abc 2021-05-02 11:05:19 2021-05-02 (10.0, 12.0] login 3
7 Abc 2021-05-03 02:26:12 2021-05-03 (2.0, 4.0] browsing 3
8 Abc 2021-05-03 03:59:10 2021-05-03 (2.0, 4.0] browsing 3
9 Abc 2021-05-03 05:40:00 2021-05-03 (4.0, 6.0] browsing 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.