![](/img/trans.png)
[英]Pandas DataFrame Calculate time difference between 2 columns on specific time range
[英]How to calculate time difference between specific row values in dataframe using python?
df 如下所示:
Time A
2019-05-18 01:15:28 7
2019-05-18 01:28:11 7
2019-05-18 01:36:36 12
2019-05-18 01:39:47 12
2019-05-18 01:53:32 12
2019-05-18 02:05:37 7
我了解如何計算連續行時間差。 但我想在 A 中的值為 7 到 12 時計算時間差。
預期 output:
Time A Time_difference
2019-05-18 01:15:28 7 0
2019-05-18 01:28:11 7 0
2019-05-18 01:36:36 12 00:21:08
2019-05-18 01:39:47 12 0
2019-05-18 01:53:32 12 0
2019-05-18 02:05:37 12 0
您可以使用loc
隔離數據框中的任何值。 返回的是一個系列,它可以像列表一樣被索引。 使用[0]
獲取系列中的第一個匹配項。
times = [
'2019-05-18 01:15:28',
'2019-05-18 01:28:11',
'2019-05-18 01:36:36',
'2019-05-18 01:39:47',
'2019-05-18 01:53:32',
'2019-05-18 02:05:37'
]
a = [9, 7, 7, 5, 12, 12]
df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
pd.Timedelta(df.loc[df.a == 12, 'times'].values[0] - df.loc[df.a == 7, 'times'].values[0])
Timedelta('0 days 00:25:21')
或者,為了便於閱讀,我們可以將代碼分開,並對新變量進行計算:
times = [
'2019-05-18 01:15:28',
'2019-05-18 01:28:11',
'2019-05-18 01:36:36',
'2019-05-18 01:39:47',
'2019-05-18 01:53:32',
'2019-05-18 02:05:37'
]
a = [9, 7, 7, 5, 12, 12]
df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
end = df.loc[df.a == 12, 'times'].values[0]
start = df.loc[df.a == 7, 'times'].values[0]
pd.Timedelta(end - start)
Timedelta('0 days 00:25:21')
樣本:
times = [
'2019-05-18 01:15:28',
'2019-05-18 01:28:11',
'2019-05-18 01:36:36',
'2019-05-18 01:39:47',
'2019-05-18 01:53:32',
'2019-05-18 02:05:37'
]
a = [7, 7, 12, 7, 12, 7]
df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
times A
0 2019-05-18 01:15:28 7
1 2019-05-18 01:28:11 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
5 2019-05-18 02:05:37 7
首先創建默認索引並僅使用7
和12
過濾行:
df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]
然后通過與移位值進行比較來獲取行中的第一個連續值:
df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
times A
0 2019-05-18 01:15:28 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
5 2019-05-18 02:05:37 7
然后用接下來的12
行過濾7
:
m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)
df2 = df1[m1 | m2]
print (df2)
times A
0 2019-05-18 01:15:28 7
2 2019-05-18 01:36:36 12
3 2019-05-18 01:39:47 7
4 2019-05-18 01:53:32 12
使用對和取消對行獲取日期時間:
out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]
最后減去:
df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
times A Time_difference
0 2019-05-18 01:15:28 7 00:00:00
1 2019-05-18 01:28:11 7 00:00:00
2 2019-05-18 01:36:36 12 00:21:08
3 2019-05-18 01:39:47 7 00:00:00
4 2019-05-18 01:53:32 12 00:13:45
5 2019-05-18 02:05:37 7 00:00:00
說明:
import pandas as pd
import numpy as np
np.random.seed(10)
date_range = pd.date_range("25-9-2019", "27-9-2019", freq="3H")
df = pd.DataFrame({'Time':date_range, 'A':np.random.choice([5,7,12], len(date_range))})
df["Seven"] = (df["A"] == 7).cumsum()
# display(df)
pass_to_next_group = {"val": None}
def diff(group):
group["Diff"]=0
loc = group.index[group["A"]==12]
time_a = pass_to_next_group["val"] if pass_to_next_group["val"] else group["Time"].iloc[0]
pass_to_next_group["val"] = None
if group.name>0 and len(loc)>0:
group.loc[loc[0],"Diff"] = time_a-group.loc[loc[0],"Time"]
else:
pass_to_next_group["val"] = time_a
return group
df.groupby("Seven").apply(diff)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.