简体   繁体   中英

Time difference between two event rows for each user in Pandas df

I have a dataframe as follows:

    imei    event_type               time   
    1107    alarm                    2020-01-28 11:32:42+00:00  
    1107    alarm_restored           2020-01-28 11:32:53+00:00
    1107    alarm_emergency          2020-01-28 11:33:03+00:00  
    1107    alarm_emergency_restored 2020-01-28 11:33:06+00:00
    1108    alarm                    2020-01-28 11:42:42+00:00  
    1108    alarm_restored           2020-01-28 11:43:53+00:00
    1109    alarm_emergency          2020-01-28 11:53:23+00:00  
    1109    alarm_emergency          2020-01-28 11:53:23+00:00  
    1109    alarm_emergency_restored 2020-01-28 11:57:06+00:00  
    1110    alarm_emergency          2020-01-29 10:23:05+00:00  
    1111    alarm_restored           2020-01-29 11:10:53+00:00  
    1112    alarm_emergency_restored 2020-01-29 12:13:23+00:00  

I want to find the time difference between alarm and restored type events for every user. I have no idea how to proceed with it. I tried calculate the time difference between two consecutive rows in pandas I tried

    df_alarm['time'].diff(3)

and got :

    0                              NaT
    1                              NaT
    2                              NaT
    3           0 days 00:00:23.706000
    4           0 days 00:27:28.364000
    ...

Which is not how I expected the results. I want results in minutes/seconds

UPDATE:

I want to find time difference in every consecutive alarm and alarm_restored, alarm_emergency and alarm_emergency_restored only if they are consecutive rows. All other rows should be NaT.

Expected Output:

    imei    event_type               time                       time_diff
    1107    alarm                    2020-01-28 11:32:42+00:00  NaT
    1107    alarm_restored           2020-01-28 11:32:53+00:00  00:00:11
    1107    alarm_emergency          2020-01-28 11:33:03+00:00  NaT
    1107    alarm_emergency_restored 2020-01-28 11:33:06+00:00  00:00:03
    1108    alarm                    2020-01-28 11:42:42+00:00  NaT
    1108    alarm_restored           2020-01-28 11:43:53+00:00  00:01:11
    1109    alarm_emergency          2020-01-28 11:14:27+00:00  NaT
    1109    alarm_emergency          2020-01-28 11:53:23+00:00  NaT
    1109    alarm_emergency_restored 2020-01-28 11:57:06+00:00  00:03:43
    1110    alarm_emergency          2020-01-29 10:23:05+00:00  NaT
    1111    alarm_restored           2020-01-29 11:10:53+00:00  NaT
    1112    alarm_emergency_restored 2020-01-29 12:13:23+00:00  NaT

As you see, if there are two consecutive alarm_* events and one restoral after that(as in rows 1109-1109), I want to find difference only between row 2 and row 3 for 1109.

This will work.

First you get the time difference between all the rows.

df["timediff"] = df.groupby(df.imei)["time"].diff()

Then, you just set timediff to NaT (not a time) for all rows that are not "*_restored" because you don't care about the time from "*_restored" to any other alarm events:

import numpy as np
df["timediff"][df.event_type.str.contains("restored") == False] = np.datetime64('NaT')

This gives exactly what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM