当某列满足基于另一列的特定条件时，是否有一种方法可以迭代地找到数据帧的索引？

Question

I have a pandas dataframe with ReadTime as Index as follows: 我有一个以ReadTime为索引的熊猫数据框，如下所示：

  ReadTime    A       B
2/4/18 0:00 6008.6  6013.55
2/4/18 0:01 6008.65 6013.6
2/4/18 0:02 6009.15 6014.05
2/4/18 0:03 6014.00 6014.1
2/4/18 0:04 6009.1  6013.7
2/4/18 0:05 6008.75 6013.65
2/4/18 0:06 6008.7  6013.25
2/4/18 0:07 6008.3  6013.25
2/4/18 0:08 6015.00 6013
2/4/18 0:09 6008.3  6003.55
2/4/18 0:10 6008.65 6013.65
2/4/18 0:11 6008.75 6013.6
2/4/18 0:12 6008.7  6013.7
2/4/18 0:13 6008.65 6013.55
2/4/18 0:14 6014.00 6013.3
2/4/18 0:15 6008.6  6013.5
2/4/18 0:16 6008.55 6013.4
2/4/18 0:17 6008.55 6013.55
2/4/18 0:18 6008.65 6013.55
2/4/18 0:19 6018    6013.6

I would like to check if values in A are greater or equal to values in B iteratively and create a new dataframe with timestamp at which it happened. 我想以迭代方式检查A中的值是否大于或等于B中的值，并创建一个带有时间戳的新数据帧。 Repeat the analysis with timestamp in which previous condition was satisfied. 使用满足先前条件的时间戳重复分析。

Sample Results are as follows: 样本结果如下：

  ReadTime      C
2/4/18 0:00 2/4/18 0:03
2/4/18 0:03 2/4/18 0:08
2/4/18 0:08 2/4/18 0:14
2/4/18 0:14 2/4/18 0:19

Thanks for the help in advance. 我在这里先向您的帮助表示感谢。

Edit: C column refers to the timestamp at which the condition was satisfied. 编辑： C列是指满足条件的时间戳。 (ie Value in A was greater than or equal to value in B considering a value in timestamp. For Eg: At 2/4/18 0:00, value of B was 6013.55. So going through the values in A after that timestamp, we can see that at 2/4/18 0:03, value of A was 6014, which exceeded the value of B (6013.55). So, 2/4/18 0:03 was brought into C corresponding to 2/4/18 0:00. （即，考虑到时间戳记中的值，A中的值大于或等于B中的值。例如：在2/4/18 0：00，B的值为6013.55。因此，在该时间戳记之后遍历A中的值，我们可以看到，在2/4/18 0：03，A的值为6014，超过了B的值（6013.55）。因此，将2/4/18 0:03带入C中，对应于2/4 / 18 0:00。

Answer 1

Here is (if I understand correctly) a solution: 这是（如果我理解正确的话）一种解决方案：

import numpy as np
df['C'] = np.where(df.A > df.B, df.index, np.nan).bfill().shift(-1)
df['X'] = (df.A > df.B).cumsum()
df = df.drop_duplicates(subset=['X'], keep='first')
df = df[['C']]

First, we fill a column named C with the timestamp of the rows where the condition is met, and put NaN elsewhere. 首先，我们在满足条件的行的时间戳中填充名为C的列，并将NaN放在其他位置。 We backfill it so that all previous rows have the same timestamp (up to the one where the condition was last met). 我们对它进行回填，以便所有先前的行都具有相同的时间戳（直到最后一次满足条件的时间戳）。 We then shift backwards by one row (to prepare for the next step). 然后，我们向后移动一排（为下一步做准备）。

In order to make the indices align the way you want, we need to group the rows. 为了使索引与所需的方式对齐，我们需要对行进行分组。 We can do this by combining your condition and cumsum() , which treats True as 1 and False as 0 . 我们可以通过结合您的条件和cumsum() ， cumsum()将True视为1并将False视为0 。 Now we can drop all the rows in a group (which all have the same timestamp in C ) except for the first. 现在，我们可以删除组中的所有行（在C所有行都具有相同的时间戳），除了第一行。 This should give you the output you want. 这应该为您提供所需的输出。

Note: your desired output doesn't match your input (at 2/4/18 0:03 , B is greater than, not less than, A ), so the answer doesn't match your example perfectly. 注意：您想要的输出与您的输入不匹配（在2/4/18 0:03 ， B大于但不小于A ），因此答案与您的示例不完全匹配。 But I think I've gotten the spirit of what you're asking - if I'm right, please correct the question, and if I'm wrong, comment and I'll change my answer. 但是我想我已经明白了您要问的问题-如果我是对的，请更正问题；如果我错了，请发表评论，然后更改答案。

当某列满足基于另一列的特定条件时，是否有一种方法可以迭代地找到数据帧的索引？

问题描述

1 个解决方案

解决方案1
0 2018-10-23 00:57:15

当某列满足基于另一列的特定条件时，是否有一种方法可以迭代地找到数据帧的索引？

问题描述

1 个解决方案

解决方案1 0 2018-10-23 00:57:15

解决方案1
0 2018-10-23 00:57:15