[英]Find consecutive values in rows in pandas Dataframe based on condition
I was looking at this question: How can I find 5 consecutive rows in pandas Dataframe where a value of a certain column is at least 0.5 , which is similar to the one I have in mind.我在看这个问题: How can I find 5 consecutive rows in pandas Dataframe 其中某一列的值至少为 0.5 ,这与我想到的相似。 I would like to find say at least 3 consecutive rows where a value is less than 0.5 (but not negative nor nan), while considering the entire dataframe and not just one column as in the question linked above.
我想在考虑整个 dataframe 而不仅仅是上面链接的问题中的一列时,至少找到 3 个连续的行,其中的值小于 0.5(但不是负数或 nan)。 Here a facsimile dataframe:
这里有一个传真 dataframe:
from random import uniform
idx = pd.date_range("2018-01-01", periods=10, freq="M")
df = pd.DataFrame(
{
'A':[0, 0.4, 0.5, 0.3, 0,0,0,0,0,0],
'B':[0, 0.6, 0.8,0, 0.3, 0.3, 0.9, 0.7,0,0],
'C':[0,0,0.5, 0.4, 0.4, 0.2,0,0,0,0],
'D':[0.4,0, 0.6, 0.5, 0.7, 0.2,0, 0.9, 0.8,0],
'E':[0.4, 0.3, 0.2, 0.7, 0.7, 0.8,0,0,0,0],
'F':[0,0,0.6, 0.7,0.8, 0.3, 0.4, 0.1,0,0]
},
index=idx
)
df = df.replace({0:np.nan})
df
Hence, since columns B and D don't satisfy the criteria should be removed from the output.因此,由于列 B 和 D 不满足标准,因此应从 output 中删除。
I'd prefer not to use for loop and the like since it is a 2000-column df, therefore I tried with the following:我不想使用 for 循环等,因为它是一个 2000 列的 df,因此我尝试了以下内容:
def consecutive_values_in_range(s, min, max):
return s.between(left=min, right=max)
min, max = 0, 0.5
df.apply(lambda col: consecutive_values_in_range(col, min, max), axis=0)
print(df)
But I didn't obtain what I was looking for, that would be something like this:但是我没有得到我想要的东西,那将是这样的:
A C E F
2018-01-31 NaN NaN 0.4 NaN
2018-02-28 0.4 NaN 0.3 NaN
2018-03-31 0.5 0.5 0.2 0.6
2018-04-30 0.3 0.4 0.7 0.7
2018-05-31 NaN 0.4 0.7 0.8
2018-06-30 NaN 0.2 0.8 0.3
2018-07-31 NaN NaN NaN 0.4
2018-08-31 NaN NaN NaN 0.1
2018-09-30 NaN NaN NaN NaN
2018-10-31 NaN NaN NaN NaN
Any suggestions?有什么建议么? Thanks in advance.
提前致谢。
lower, upper = 0, 0.5
n = 3
df.loc[:, ((df <= upper) & (df >= lower)).rolling(n).sum().eq(n).any()]
df
df
获取 is_between 掩码to get要得到
A C E F
2018-01-31 NaN NaN 0.4 NaN
2018-02-28 0.4 NaN 0.3 NaN
2018-03-31 0.5 0.5 0.2 0.6
2018-04-30 0.3 0.4 0.7 0.7
2018-05-31 NaN 0.4 0.7 0.8
2018-06-30 NaN 0.2 0.8 0.3
2018-07-31 NaN NaN NaN 0.4
2018-08-31 NaN NaN NaN 0.1
2018-09-30 NaN NaN NaN NaN
2018-10-31 NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.