简体   繁体   中英

Get duration from table of timestamped events, pandas dataframe

I have a table (Pandas dataframe) in the following format:

User Event Timestamp
1 Online 2017-09-01 00:00:12
1 Offline 2017-09-01 00:40:16
2 Online 2017-09-01 03:17:53
2 Online 2017-09-01 13:00:47
1 Online 2017-09-01 13:06:05
2 Offline 2017-09-01 13:08:12
3 Offline 2017-09-01 14:01:21
3 Offline 2017-09-01 14:07:14
4 Offline 2017-09-01 16:27:24

For every user, I want to convert this into window of online activity when they are in 'Online' state. Rules for online state ->

  1. At the start of the day, everyone is 'Offline'
  2. The first time they have an event marked 'Online', they go into the 'Online' state.
  3. They stay in the 'Online' state until an 'Offline' event occurs, or the day ends.

So for this table the result should be:

User Date Start Time End Time
1 2017-09-01 00:00:12 00:40:16
1 2017-09-01 13:06:05 23:59:59
2 2017-09-01 03:17:53 13:08:12

Users 3 and 4 don't appear because they never got online.

I have done this using loops but that is not a scalable solution so I want to know how I can do this without individually matching start times to corresponding end times.

given your initial dataframe,

def duration_frame(indf):
    timekeeper= {}
    rslt = []
    for i in range(indf.shape[0]):
        row = indf.iloc[i]
        usrTimes = timekeeper.pop(row[0], None)
        if row[1] == 'Online':
            usrTimes = row[2]
        elif row[1] == 'Offline':
            if usrTimes:
                rslt.append([row[0], usrTimes, row[2]])
                usrTimes = None        
        timekeeper[row[0]] = usrTimes
    for usr, st_tme in timekeeper.items():
        if st_tme:
            rslt.append([usr, st_tme, st_tme.replace(hour=23, minute=59, second=59)])
    return pd.DataFrame(data= rslt, columns= ['User', 'Start', 'Stop'])

running duration_frame(df) yields:

    User      Start             Stop
0   1   2017-09-01 00:00:12 2017-09-01 00:40:16
1   2   2017-09-01 13:00:47 2017-09-01 13:08:12
2   1   2017-09-01 13:06:05 2017-09-01 23:59:59

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM