[英]Pandas DataFrame find closest index in previous rows where condition is met
I have the following df1
dataframe:我有以下df1
数据框:
t A
0 23:00 2
1 23:01 1
2 23:02 2
3 23:03 2
4 23:04 6
5 23:05 5
6 23:06 4
7 23:07 9
8 23:08 7
9 23:09 10
10 23:10 8
For each t
(increments simplified here, not uniformly distributed in real life), I would like to find, if any, the most recent time tr
within the previous 5 min where A(t)- A(tr) >= 4
.对于每个t
(此处简化了增量,在现实生活中并非均匀分布),我想找到前 5 分钟内的最近时间tr
(如果有),其中A(t)- A(tr) >= 4
。 I want to get:我想得到:
t A tr
0 23:00 2
1 23:01 1
2 23:02 2
3 23:03 2
4 23:04 6 23:03
5 23:05 5 23:01
6 23:06 4
7 23:07 9 23:06
8 23:08 7
9 23:09 10 23:06
10 23:10 8 23:06
Currently, I can use shift(-1)
to compare each row to the previous row like cond = df1['A'] >= df1['A'].shift(-1) + 4
.目前,我可以使用shift(-1)
将每一行与前一行进行比较,例如cond = df1['A'] >= df1['A'].shift(-1) + 4
。
How can I look further in time?我怎样才能及时看得更远?
Assuming your data is continuous by the minute, then you can do usual shift:假设您的数据按分钟是连续的,那么您可以进行常规班次:
df1['t'] = pd.to_timedelta(df1['t'].add(':00'))
df = pd.DataFrame({i:df1.A - df1.A.shift(i) >= 4 for i in range(1,5)})
df1['t'] - pd.to_timedelta('1min') * df.idxmax(axis=1).where(df.any(1))
Output:输出:
0 NaT
1 NaT
2 NaT
3 NaT
4 23:03:00
5 23:01:00
6 NaT
7 23:06:00
8 NaT
9 23:06:00
10 23:06:00
dtype: timedelta64[ns]
I added a datetime
index and used rolling()
, which now includes time-window functionalities beyond simple index-window.我添加了一个datetime
索引并使用了rolling()
,它现在包括了简单索引窗口之外的时间窗口功能。
import pandas as pd
import numpy as np
import datetime
df1 = pd.DataFrame({'t' : [
datetime.datetime(2020, 5, 17, 23, 0, 0),
datetime.datetime(2020, 5, 17, 23, 0, 1),
datetime.datetime(2020, 5, 17, 23, 0, 2),
datetime.datetime(2020, 5, 17, 23, 0, 3),
datetime.datetime(2020, 5, 17, 23, 0, 4),
datetime.datetime(2020, 5, 17, 23, 0, 5),
datetime.datetime(2020, 5, 17, 23, 0, 6),
datetime.datetime(2020, 5, 17, 23, 0, 7),
datetime.datetime(2020, 5, 17, 23, 0, 8),
datetime.datetime(2020, 5, 17, 23, 0, 9),
datetime.datetime(2020, 5, 17, 23, 0, 10)
], 'A' : [2,1,2,2,6,5,4,9,7,10,8]}, columns=['t', 'A'])
df1.index = df1['t']
df2 = df1
cond = df1['A'] >= df1.rolling('5s')['A'].apply(lambda x: x[0] + 4)
result = df1[cond]
Gives给
t A
2020-05-17 23:00:04 6
2020-05-17 23:00:05 5
2020-05-17 23:00:07 9
2020-05-17 23:00:09 10
2020-05-17 23:00:10 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.