I have a table (Pandas dataframe) in the following format:
User | Event | Timestamp |
---|---|---|
1 | Online | 2017-09-01 00:00:12 |
1 | Offline | 2017-09-01 00:40:16 |
2 | Online | 2017-09-01 03:17:53 |
2 | Online | 2017-09-01 13:00:47 |
1 | Online | 2017-09-01 13:06:05 |
2 | Offline | 2017-09-01 13:08:12 |
3 | Offline | 2017-09-01 14:01:21 |
3 | Offline | 2017-09-01 14:07:14 |
4 | Offline | 2017-09-01 16:27:24 |
For every user, I want to convert this into window of online activity when they are in 'Online' state. Rules for online state ->
So for this table the result should be:
User | Date | Start Time | End Time |
---|---|---|---|
1 | 2017-09-01 | 00:00:12 | 00:40:16 |
1 | 2017-09-01 | 13:06:05 | 23:59:59 |
2 | 2017-09-01 | 03:17:53 | 13:08:12 |
Users 3 and 4 don't appear because they never got online.
I have done this using loops but that is not a scalable solution so I want to know how I can do this without individually matching start times to corresponding end times.
given your initial dataframe,
def duration_frame(indf):
timekeeper= {}
rslt = []
for i in range(indf.shape[0]):
row = indf.iloc[i]
usrTimes = timekeeper.pop(row[0], None)
if row[1] == 'Online':
usrTimes = row[2]
elif row[1] == 'Offline':
if usrTimes:
rslt.append([row[0], usrTimes, row[2]])
usrTimes = None
timekeeper[row[0]] = usrTimes
for usr, st_tme in timekeeper.items():
if st_tme:
rslt.append([usr, st_tme, st_tme.replace(hour=23, minute=59, second=59)])
return pd.DataFrame(data= rslt, columns= ['User', 'Start', 'Stop'])
running duration_frame(df)
yields:
User Start Stop
0 1 2017-09-01 00:00:12 2017-09-01 00:40:16
1 2 2017-09-01 13:00:47 2017-09-01 13:08:12
2 1 2017-09-01 13:06:05 2017-09-01 23:59:59
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.