I have daily user login/logout data like this:
date,user,action
2020-04-14 01:00:00,user1,login
2020-04-14 01:05:00,user2,login
2020-04-14 01:10:00,user3,login
2020-04-14 02:40:00,user2,logout
2020-04-14 02:50:00,user3,logout
2020-04-14 03:10:00,user2,login
2020-04-14 03:10:00,user1,logout
2020-04-14 03:30:00,user3,login
2020-04-14 04:20:00,user2,logout
Users can login/logout multiple times in a day. A session closes and then new session opens. (like user2) I need to get the duration for every session and there is no session id.
How can i merge this two events in one row: Login and first logout after login. Like this:
login_date,logout_date,user
2020-04-14 01:00:00,2020-04-14 03:10:00,user1
2020-04-14 01:05:00,2020-04-14 02:40:00,user2
2020-04-14 01:10:00,2020-04-14 02:50:00,user3
2020-04-14 03:10:00,2020-04-14 04:20:00,user2
2020-04-14 03:30:00,-,user3
IIUC:
(df.assign(row=lambda x: df.action.eq('login').groupby(df['user']).cumsum())
.pivot_table(index=['row','user'], columns='action', values='date', aggfunc='first')
.reset_index('row', drop=True)
.reset_index()
)
Output:
action user login logout
0 user1 2020-04-14 01:00:00 2020-04-14 03:10:00
1 user2 2020-04-14 01:05:00 2020-04-14 02:40:00
2 user3 2020-04-14 01:10:00 2020-04-14 02:50:00
3 user2 2020-04-14 03:10:00 2020-04-14 04:20:00
4 user3 2020-04-14 03:30:00 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.