How to calculate time difference between specific row values in dataframe using python?

Question

The df looks like below:


Time                    A 

2019-05-18 01:15:28     7
2019-05-18 01:28:11     7
2019-05-18 01:36:36     12
2019-05-18 01:39:47     12
2019-05-18 01:53:32     12
2019-05-18 02:05:37     7

I understand how to calculate consecutive row time difference. But I want to calculate the time difference whenever the Value in A is 7 to 12.

Expected output:


Time                    A   Time_difference

2019-05-18 01:15:28     7   0
2019-05-18 01:28:11     7   0
2019-05-18 01:36:36     12  00:21:08
2019-05-18 01:39:47     12  0
2019-05-18 01:53:32     12  0
2019-05-18 02:05:37     12  0

Answer 1

You can isolate any values in dataframes using loc . What gets returned is a Series, which can be indexed like a list. Use [0] to get the first occurrence in the Series.

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [9, 7, 7, 5, 12, 12]

df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
pd.Timedelta(df.loc[df.a == 12, 'times'].values[0] - df.loc[df.a == 7, 'times'].values[0])

Timedelta('0 days 00:25:21')

Or we can break that code apart for readability's sake and do the calculations on new variables:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [9, 7, 7, 5, 12, 12]

df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
end = df.loc[df.a == 12, 'times'].values[0]
start = df.loc[df.a == 7, 'times'].values[0]
pd.Timedelta(end - start)

Timedelta('0 days 00:25:21')

Answer 2

Sample:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [7, 7, 12, 7, 12, 7]

df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
                times   A
0 2019-05-18 01:15:28   7
1 2019-05-18 01:28:11   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

First create default index and filter rows with 7 and 12 only:

df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]

Then get first consecutive values in rows with compare with shifted values:

df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

Then filter 7 with next 12 rows:

m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)

df2 = df1[m1 | m2]
print (df2)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12

Get datetimes with pair and unpairs rows:

out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]

And last subtract:

df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
                times   A Time_difference
0 2019-05-18 01:15:28   7        00:00:00
1 2019-05-18 01:28:11   7        00:00:00
2 2019-05-18 01:36:36  12        00:21:08
3 2019-05-18 01:39:47   7        00:00:00
4 2019-05-18 01:53:32  12        00:13:45
5 2019-05-18 02:05:37   7        00:00:00

Answer 3

Explanation :

(df["A"] == 7).cumsum() separates rows to each 7
for each group of 7, if there is 12 the substract the 1st row with 12 from 1st row of group
If not pass value of 1st row of group to next group until 12 is found


import pandas as pd
import numpy as np

np.random.seed(10)
date_range = pd.date_range("25-9-2019", "27-9-2019", freq="3H")
df = pd.DataFrame({'Time':date_range, 'A':np.random.choice([5,7,12], len(date_range))})

df["Seven"] = (df["A"] == 7).cumsum()

# display(df)
pass_to_next_group = {"val": None}
def diff(group):
    group["Diff"]=0
    loc = group.index[group["A"]==12]

    time_a = pass_to_next_group["val"] if pass_to_next_group["val"] else group["Time"].iloc[0]
    pass_to_next_group["val"] = None

    if group.name>0 and len(loc)>0:           
        group.loc[loc[0],"Diff"] =  time_a-group.loc[loc[0],"Time"]
    else:
        pass_to_next_group["val"] = time_a

    return group


df.groupby("Seven").apply(diff)

How to calculate time difference between specific row values in dataframe using python?

Question

3 answers

solution1
2 2019-09-25 09:15:09

solution2
1 ACCPTED 2019-09-25 11:08:42

solution3
1 2019-09-25 11:35:07

How to calculate time difference between specific row values in dataframe using python?

Question

3 answers

solution1 2 2019-09-25 09:15:09

solution2 1 ACCPTED 2019-09-25 11:08:42

solution3 1 2019-09-25 11:35:07

solution1
2 2019-09-25 09:15:09

solution2
1 ACCPTED 2019-09-25 11:08:42

solution3
1 2019-09-25 11:35:07