![](/img/trans.png)
[英]Find number of consecutively increasing/decreasing values in a pandas column (and fill another col with it) in an optimized way
[英]How to find the highest count of sequential (numbers|increasing|decreasing) in pandas DataFrame column of values
如何在同一列中找到连续出现的最高计数,例如相同的数字、递增的值或递减的值。
所以给出类似的东西:
h_diff l_diff monotonic
timestamp
2000-01-18 NaN NaN NaN
2000-01-19 2.75 2.93 1.0
2000-01-20 12.75 10.13 1.0
2000-01-21 -7.25 -3.31 0.0
2000-01-24 -1.50 -5.07 0.0
2000-01-25 0.37 -2.75 1.0
2000-01-26 1.07 7.38 1.0
2000-01-27 -1.19 -2.75 0.0
2000-01-28 -2.13 -6.38 0.0
2000-01-31 -7.00 -6.12 0.0
h_diff 中正值的单调性最高值为 2,负值的单调性最高值为 3。 l_diff 相同。 因此,如果滚动为 10 或 n,我将如何找到最高单调计数,同时仍然能够动态更改窗口大小?
这给了我单调列的 1.0 值: lambda x: np.all(np.diff(x) > 0) 和 lambda x: np.count_nonzero(np.diff(x) > 0) 将计算总计数1.0 用于窗口,但我试图找到的是一系列给定窗口中运行时间最长的。
我希望的是这样的:
h_diff l_diff monotonic
timestamp
2000-01-18 NaN NaN NaN
2000-01-19 2.75 2.93 1.0
2000-01-20 12.75 10.13 2.0
2000-01-21 -7.25 -3.31 0.0
2000-01-24 -1.50 -5.07 0.0
2000-01-25 0.37 -2.75 1.0
2000-01-26 1.07 7.38 2.0
2000-01-27 1.19 -2.75 3.0
2000-01-28 2.13 -6.38 4.0
2000-01-31 -7.00 -6.12 0.0
下面的代码应该可以找到连续出现的正数或负数的技巧。 下面的代码用于列h_diff
df1[df1.h_diff.gt(0)].index.to_series().diff().ne(1).cumsum().value_counts().max() #sequential occurrences greater than 0
df1[df1.h_diff.lt(0)].index.to_series().diff().ne(1).cumsum().value_counts().max() #sequential occurrences less than 0
使用GroupBy.cumcount
+ Series.where
。
初始数据帧
h_diff l_diff
timestamp
2000-01-18 NaN NaN
2000-01-19 2.75 2.93
2000-01-20 12.75 10.13
2000-01-21 -7.25 -3.31
2000-01-24 -1.50 -5.07
2000-01-25 0.37 -2.75
2000-01-26 1.07 7.38
2000-01-27 1.19 -2.75
2000-01-28 2.13 -6.38
2000-01-31 -7.00 -6.12
h = df['h_diff'].gt(0)
#h = np.sign(df['h_diff'])
df['monotonic_h']=h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1).where(h,0)
print(df)
h_diff l_diff monotonic_h
timestamp
2000-01-18 NaN NaN 0
2000-01-19 2.75 2.93 1
2000-01-20 12.75 10.13 2
2000-01-21 -7.25 -3.31 0
2000-01-24 -1.50 -5.07 0
2000-01-25 0.37 -2.75 1
2000-01-26 1.07 7.38 2
2000-01-27 1.19 -2.75 3
2000-01-28 2.13 -6.38 4
2000-01-31 -7.00 -6.12 0
df['monotonic_h'].max()
#4
细节
h.ne(h.shift()).cumsum()
timestamp
2000-01-18 1
2000-01-19 2
2000-01-20 2
2000-01-21 3
2000-01-24 3
2000-01-25 4
2000-01-26 4
2000-01-27 4
2000-01-28 4
2000-01-31 5
Name: h_diff, dtype: int64
更新
df = df.join( h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1)
.to_frame('values')
.assign(monotic = np.where(h,'monotic_h_greater_0',
'monotic_h_not_greater_0'),
index = lambda x: x.index)
.where(df['h_diff'].notna())
.pivot_table(columns = 'monotic',
index = 'index',
values = 'values',
fill_value=0) )
print(df)
h_diff l_diff monotic_h_greater_0 monotic_h_not_greater_0
timestamp
2000-01-18 NaN NaN NaN NaN
2000-01-19 2.75 2.93 1.0 0.0
2000-01-20 12.75 10.13 2.0 0.0
2000-01-21 -7.25 -3.31 0.0 1.0
2000-01-24 -1.50 -5.07 0.0 2.0
2000-01-25 0.37 -2.75 1.0 0.0
2000-01-26 1.07 7.38 2.0 0.0
2000-01-27 1.19 -2.75 3.0 0.0
2000-01-28 2.13 -6.38 4.0 0.0
2000-01-31 -7.00 -6.12 0.0 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.