簡體   English   中英

如何在 python 中按連續日期分組?

[英]How to group by consecutive dates in python?

我有這個數據。 我想找出連續發生了多少天的活動:

    Id          datetime             date       Hour            Activity
0   Abc         2021-04-26 14:30:33  2021-04-26 (12.0, 14.0]    login
1   Abc         2021-04-26 12:55:27  2021-04-26 (12.0, 14.0]    login
2   Abc         2021-04-26 13:30:31  2021-04-26 (12.0, 14.0]    login
3   Abc         2021-04-28 11:55:33  2021-04-28 (10.0, 12.0]    login
4   Abc         2021-05-01 08:25:15  2021-05-01 (8.0, 10.0]     login
5   Abc         2021-05-01 09:45:01  2021-05-01 (8.0, 10.0]     login
6   Abc         2021-05-02 11:05:19  2021-05-02 (10.0, 12.0]    login
7   Abc         2021-05-03 02:26:12  2021-05-03 (2.0, 4.0]      browsing
8   Abc         2021-05-03 03:59:10  2021-05-03 (2.0, 4.0]      browsing
9   Abc         2021-05-03 05:40:00  2021-05-03 (4.0, 6.0]      browsing

我嘗試將所有連續日期分組:

sample['Consecutive'] = sample.groupby('Id').date.diff().dt.days.ne(1).cumsum()

這給了我一個 output 為:

    Id          datetime             date       Hour            Activity   Consecutive
0   Abc         2021-04-26 14:30:33  2021-04-26 (12.0, 14.0]    login      1
1   Abc         2021-04-26 12:55:27  2021-04-26 (12.0, 14.0]    login      2
2   Abc         2021-04-26 13:30:31  2021-04-26 (12.0, 14.0]    login      3
3   Abc         2021-04-28 11:55:33  2021-04-28 (10.0, 12.0]    login      4
4   Abc         2021-05-01 08:25:15  2021-05-01 (8.0, 10.0]     login      5
5   Abc         2021-05-01 09:45:01  2021-05-01 (8.0, 10.0]     login      6
6   Abc         2021-05-02 11:05:19  2021-05-02 (10.0, 12.0]    login      6
7   Abc         2021-05-03 02:26:12  2021-05-03 (2.0, 4.0]      browsing   6 
8   Abc         2021-05-03 03:59:10  2021-05-03 (2.0, 4.0]      browsing   7
9   Abc         2021-05-03 05:40:00  2021-05-03 (4.0, 6.0]      browsing   8

所需的 output:

    Id          datetime             date       Hour            Activity   Consecutive
0   Abc         2021-04-26 14:30:33  2021-04-26 (12.0, 14.0]    login      1
1   Abc         2021-04-26 12:55:27  2021-04-26 (12.0, 14.0]    login      1
2   Abc         2021-04-26 13:30:31  2021-04-26 (12.0, 14.0]    login      1
3   Abc         2021-04-28 11:55:33  2021-04-28 (10.0, 12.0]    login      2
4   Abc         2021-05-01 08:25:15  2021-05-01 (8.0, 10.0]     login      3
5   Abc         2021-05-01 09:45:01  2021-05-01 (8.0, 10.0]     login      3
6   Abc         2021-05-02 11:05:19  2021-05-02 (10.0, 12.0]    login      3
7   Abc         2021-05-03 02:26:12  2021-05-03 (2.0, 4.0]      browsing   3 
8   Abc         2021-05-03 03:59:10  2021-05-03 (2.0, 4.0]      browsing   3
9   Abc         2021-05-03 05:40:00  2021-05-03 (4.0, 6.0]      browsing   3

請幫我糾正這個問題。

如果我正確理解了您要實現的目標,則只需將ne(1)更改為gt(1)

df['Consecutive'] = df.groupby('Id')['date'].diff().dt.days.gt(1).cumsum() + 1
df

Output:


    Id             datetime       date          Hour  Activity  Consecutive
0  Abc  2021-04-26 14:30:33 2021-04-26  (12.0, 14.0]     login            1
1  Abc  2021-04-26 12:55:27 2021-04-26  (12.0, 14.0]     login            1
2  Abc  2021-04-26 13:30:31 2021-04-26  (12.0, 14.0]     login            1
3  Abc  2021-04-28 11:55:33 2021-04-28  (10.0, 12.0]     login            2
4  Abc  2021-05-01 08:25:15 2021-05-01   (8.0, 10.0]     login            3
5  Abc  2021-05-01 09:45:01 2021-05-01   (8.0, 10.0]     login            3
6  Abc  2021-05-02 11:05:19 2021-05-02  (10.0, 12.0]     login            3
7  Abc  2021-05-03 02:26:12 2021-05-03    (2.0, 4.0]  browsing            3
8  Abc  2021-05-03 03:59:10 2021-05-03    (2.0, 4.0]  browsing            3
9  Abc  2021-05-03 05:40:00 2021-05-03    (4.0, 6.0]  browsing            3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM