简体   繁体   中英

Pandas - How to merge 2 related events in one line

I have daily user login/logout data like this:

date,user,action
2020-04-14 01:00:00,user1,login
2020-04-14 01:05:00,user2,login
2020-04-14 01:10:00,user3,login
2020-04-14 02:40:00,user2,logout
2020-04-14 02:50:00,user3,logout
2020-04-14 03:10:00,user2,login
2020-04-14 03:10:00,user1,logout
2020-04-14 03:30:00,user3,login
2020-04-14 04:20:00,user2,logout

Users can login/logout multiple times in a day. A session closes and then new session opens. (like user2) I need to get the duration for every session and there is no session id.

How can i merge this two events in one row: Login and first logout after login. Like this:

login_date,logout_date,user
2020-04-14 01:00:00,2020-04-14 03:10:00,user1
2020-04-14 01:05:00,2020-04-14 02:40:00,user2
2020-04-14 01:10:00,2020-04-14 02:50:00,user3
2020-04-14 03:10:00,2020-04-14 04:20:00,user2
2020-04-14 03:30:00,-,user3

IIUC:

(df.assign(row=lambda x: df.action.eq('login').groupby(df['user']).cumsum())
   .pivot_table(index=['row','user'], columns='action', values='date', aggfunc='first')
   .reset_index('row', drop=True)
   .reset_index()
)

Output:

action   user                login               logout
0       user1  2020-04-14 01:00:00  2020-04-14 03:10:00
1       user2  2020-04-14 01:05:00  2020-04-14 02:40:00
2       user3  2020-04-14 01:10:00  2020-04-14 02:50:00
3       user2  2020-04-14 03:10:00  2020-04-14 04:20:00
4       user3  2020-04-14 03:30:00                  NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM