简体   繁体   English

当某列满足基于另一列的特定条件时,是否有一种方法可以迭代地找到数据帧的索引?

[英]Is there a way to iteratively find index of a dataframe when a column satisfies a certain condition based on another column?

I have a pandas dataframe with ReadTime as Index as follows: 我有一个以ReadTime为索引的熊猫数据框,如下所示:

  ReadTime    A       B
2/4/18 0:00 6008.6  6013.55
2/4/18 0:01 6008.65 6013.6
2/4/18 0:02 6009.15 6014.05
2/4/18 0:03 6014.00 6014.1
2/4/18 0:04 6009.1  6013.7
2/4/18 0:05 6008.75 6013.65
2/4/18 0:06 6008.7  6013.25
2/4/18 0:07 6008.3  6013.25
2/4/18 0:08 6015.00 6013
2/4/18 0:09 6008.3  6003.55
2/4/18 0:10 6008.65 6013.65
2/4/18 0:11 6008.75 6013.6
2/4/18 0:12 6008.7  6013.7
2/4/18 0:13 6008.65 6013.55
2/4/18 0:14 6014.00 6013.3
2/4/18 0:15 6008.6  6013.5
2/4/18 0:16 6008.55 6013.4
2/4/18 0:17 6008.55 6013.55
2/4/18 0:18 6008.65 6013.55
2/4/18 0:19 6018    6013.6

I would like to check if values in A are greater or equal to values in B iteratively and create a new dataframe with timestamp at which it happened. 我想以迭代方式检查A中的值是否大于或等于B中的值,并创建一个带有时间戳的新数据帧。 Repeat the analysis with timestamp in which previous condition was satisfied. 使用满足先前条件的时间戳重复分析。

Sample Results are as follows: 样本结果如下:

  ReadTime      C
2/4/18 0:00 2/4/18 0:03
2/4/18 0:03 2/4/18 0:08
2/4/18 0:08 2/4/18 0:14
2/4/18 0:14 2/4/18 0:19

Thanks for the help in advance. 我在这里先向您的帮助表示感谢。

Edit: C column refers to the timestamp at which the condition was satisfied. 编辑: C列是指满足条件的时间戳。 (ie Value in A was greater than or equal to value in B considering a value in timestamp. For Eg: At 2/4/18 0:00, value of B was 6013.55. So going through the values in A after that timestamp, we can see that at 2/4/18 0:03, value of A was 6014, which exceeded the value of B (6013.55). So, 2/4/18 0:03 was brought into C corresponding to 2/4/18 0:00. (即,考虑到时间戳记中的值,A中的值大于或等于B中的值。例如:在2/4/18 0:00,B的值为6013.55。因此,在该时间戳记之后遍历A中的值,我们可以看到,在2/4/18 0:03,A的值为6014,超过了B的值(6013.55)。因此,将2/4/18 0:03带入C中,对应于2/4 / 18 0:00。

Here is (if I understand correctly) a solution: 这是(如果我理解正确的话)一种解决方案:

import numpy as np
df['C'] = np.where(df.A > df.B, df.index, np.nan).bfill().shift(-1)
df['X'] = (df.A > df.B).cumsum()
df = df.drop_duplicates(subset=['X'], keep='first')
df = df[['C']]

First, we fill a column named C with the timestamp of the rows where the condition is met, and put NaN elsewhere. 首先,我们在满足条件的行的时间戳中填充名为C的列,并将NaN放在其他位置。 We backfill it so that all previous rows have the same timestamp (up to the one where the condition was last met). 我们对它进行回填,以便所有先前的行都具有相同的时间戳(直到最后一次满足条件的时间戳)。 We then shift backwards by one row (to prepare for the next step). 然后,我们向后移动一排(为下一步做准备)。

In order to make the indices align the way you want, we need to group the rows. 为了使索引与所需的方式对齐,我们需要对行进行分组。 We can do this by combining your condition and cumsum() , which treats True as 1 and False as 0 . 我们可以通过结合您的条件和cumsum()cumsum()True视为1并将False视为0 Now we can drop all the rows in a group (which all have the same timestamp in C ) except for the first. 现在,我们可以删除组中的所有行(在C所有行都具有相同的时间戳),除了第一行。 This should give you the output you want. 这应该为您提供所需的输出。

Note: your desired output doesn't match your input (at 2/4/18 0:03 , B is greater than, not less than, A ), so the answer doesn't match your example perfectly. 注意:您想要的输出与您的输入不匹配(在2/4/18 0:03B大于但不小于A ),因此答案与您的示例不完全匹配。 But I think I've gotten the spirit of what you're asking - if I'm right, please correct the question, and if I'm wrong, comment and I'll change my answer. 但是我想我已经明白了您要问的问题-如果我是对的,请更正问题;如果我错了,请发表评论,然后更改答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据数据框的另一列中的条件查找列中的最小值? - How to find minimum value in a column based on condition in an another column of a dataframe? 当pandas中的列满足某个条件时如何拉第一个实例? - How to pull the first instance when a column satisfies a certain condition in pandas? 在 dataframe 中查找满足 if 条件的行的索引标签 - Find the index labels of rows that satisfies an if condition in a dataframe 测试 pandas DataFrame 的任何列是否满足条件 - Test if any column of a pandas DataFrame satisfies a condition 如何获取满足条件的第一列的索引 - How to get index of the first column that satisfies condition 根据条件将一个 dataframe 列的值分配给另一个 dataframe 列 - assign values of one dataframe column to another dataframe column based on condition 一个 dataframe 列与另一个 dataframe 列的倍数基于条件 - One dataframe column multiple with another dataframe column based on condition 根据条件更新数据框列的有效方法 - efficient way to update dataframe column based on condition 如何检查 dataframe 中的值是否满足基于列中所有或最后几个值的条件并替换它? - How to check if a value in dataframe satisfies a condition based on all or last few values in the column and replace it? 熊猫:根据另一个数据框中的索引更新列 - Pandas: update a column based on index in another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM