[英]Is there a way to iteratively find index of a dataframe when a column satisfies a certain condition based on another column?
I have a pandas dataframe with ReadTime as Index as follows: 我有一个以ReadTime为索引的熊猫数据框,如下所示:
ReadTime A B
2/4/18 0:00 6008.6 6013.55
2/4/18 0:01 6008.65 6013.6
2/4/18 0:02 6009.15 6014.05
2/4/18 0:03 6014.00 6014.1
2/4/18 0:04 6009.1 6013.7
2/4/18 0:05 6008.75 6013.65
2/4/18 0:06 6008.7 6013.25
2/4/18 0:07 6008.3 6013.25
2/4/18 0:08 6015.00 6013
2/4/18 0:09 6008.3 6003.55
2/4/18 0:10 6008.65 6013.65
2/4/18 0:11 6008.75 6013.6
2/4/18 0:12 6008.7 6013.7
2/4/18 0:13 6008.65 6013.55
2/4/18 0:14 6014.00 6013.3
2/4/18 0:15 6008.6 6013.5
2/4/18 0:16 6008.55 6013.4
2/4/18 0:17 6008.55 6013.55
2/4/18 0:18 6008.65 6013.55
2/4/18 0:19 6018 6013.6
I would like to check if values in A are greater or equal to values in B iteratively and create a new dataframe with timestamp at which it happened. 我想以迭代方式检查A中的值是否大于或等于B中的值,并创建一个带有时间戳的新数据帧。 Repeat the analysis with timestamp in which previous condition was satisfied.
使用满足先前条件的时间戳重复分析。
Sample Results are as follows: 样本结果如下:
ReadTime C
2/4/18 0:00 2/4/18 0:03
2/4/18 0:03 2/4/18 0:08
2/4/18 0:08 2/4/18 0:14
2/4/18 0:14 2/4/18 0:19
Thanks for the help in advance. 我在这里先向您的帮助表示感谢。
Edit: C column refers to the timestamp at which the condition was satisfied. 编辑: C列是指满足条件的时间戳。 (ie Value in A was greater than or equal to value in B considering a value in timestamp. For Eg: At 2/4/18 0:00, value of B was 6013.55. So going through the values in A after that timestamp, we can see that at 2/4/18 0:03, value of A was 6014, which exceeded the value of B (6013.55). So, 2/4/18 0:03 was brought into C corresponding to 2/4/18 0:00.
(即,考虑到时间戳记中的值,A中的值大于或等于B中的值。例如:在2/4/18 0:00,B的值为6013.55。因此,在该时间戳记之后遍历A中的值,我们可以看到,在2/4/18 0:03,A的值为6014,超过了B的值(6013.55)。因此,将2/4/18 0:03带入C中,对应于2/4 / 18 0:00。
Here is (if I understand correctly) a solution: 这是(如果我理解正确的话)一种解决方案:
import numpy as np
df['C'] = np.where(df.A > df.B, df.index, np.nan).bfill().shift(-1)
df['X'] = (df.A > df.B).cumsum()
df = df.drop_duplicates(subset=['X'], keep='first')
df = df[['C']]
First, we fill a column named C
with the timestamp of the rows where the condition is met, and put NaN
elsewhere. 首先,我们在满足条件的行的时间戳中填充名为
C
的列,并将NaN
放在其他位置。 We backfill it so that all previous rows have the same timestamp (up to the one where the condition was last met). 我们对它进行回填,以便所有先前的行都具有相同的时间戳(直到最后一次满足条件的时间戳)。 We then shift backwards by one row (to prepare for the next step).
然后,我们向后移动一排(为下一步做准备)。
In order to make the indices align the way you want, we need to group the rows. 为了使索引与所需的方式对齐,我们需要对行进行分组。 We can do this by combining your condition and
cumsum()
, which treats True
as 1
and False
as 0
. 我们可以通过结合您的条件和
cumsum()
, cumsum()
将True
视为1
并将False
视为0
。 Now we can drop all the rows in a group (which all have the same timestamp in C
) except for the first. 现在,我们可以删除组中的所有行(在
C
所有行都具有相同的时间戳),除了第一行。 This should give you the output you want. 这应该为您提供所需的输出。
Note: your desired output doesn't match your input (at 2/4/18 0:03
, B
is greater than, not less than, A
), so the answer doesn't match your example perfectly. 注意:您想要的输出与您的输入不匹配(在
2/4/18 0:03
, B
大于但不小于A
),因此答案与您的示例不完全匹配。 But I think I've gotten the spirit of what you're asking - if I'm right, please correct the question, and if I'm wrong, comment and I'll change my answer. 但是我想我已经明白了您要问的问题-如果我是对的,请更正问题;如果我错了,请发表评论,然后更改答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.