如何在pandas DataFrame值列中找到顺序（数字|增加|减少）的最高计数

Question

如何在同一列中找到连续出现的最高计数，例如相同的数字、递增的值或递减的值。

所以给出类似的东西：

            h_diff  l_diff  monotonic
timestamp                            
2000-01-18     NaN     NaN        NaN
2000-01-19    2.75    2.93        1.0
2000-01-20   12.75   10.13        1.0
2000-01-21   -7.25   -3.31        0.0
2000-01-24   -1.50   -5.07        0.0
2000-01-25    0.37   -2.75        1.0
2000-01-26    1.07    7.38        1.0
2000-01-27   -1.19   -2.75        0.0
2000-01-28   -2.13   -6.38        0.0
2000-01-31   -7.00   -6.12        0.0

h_diff 中正值的单调性最高值为 2，负值的单调性最高值为 3。 l_diff 相同。 因此，如果滚动为 10 或 n，我将如何找到最高单调计数，同时仍然能够动态更改窗口大小？

这给了我单调列的 1.0 值： lambda x: np.all(np.diff(x) > 0) 和 lambda x: np.count_nonzero(np.diff(x) > 0) 将计算总计数1.0 用于窗口，但我试图找到的是一系列给定窗口中运行时间最长的。

我希望的是这样的：

           h_diff  l_diff  monotonic
timestamp                            
2000-01-18     NaN     NaN        NaN
2000-01-19    2.75    2.93        1.0
2000-01-20   12.75   10.13        2.0
2000-01-21   -7.25   -3.31        0.0
2000-01-24   -1.50   -5.07        0.0
2000-01-25    0.37   -2.75        1.0
2000-01-26    1.07    7.38        2.0
2000-01-27    1.19   -2.75        3.0
2000-01-28    2.13   -6.38        4.0
2000-01-31   -7.00   -6.12        0.0

Answer 1

下面的代码应该可以找到连续出现的正数或负数的技巧。 下面的代码用于列h_diff

df1[df1.h_diff.gt(0)].index.to_series().diff().ne(1).cumsum().value_counts().max() #sequential occurrences greater than 0

df1[df1.h_diff.lt(0)].index.to_series().diff().ne(1).cumsum().value_counts().max() #sequential occurrences less than 0

Answer 2

使用GroupBy.cumcount + Series.where 。

初始数据帧

            h_diff  l_diff
timestamp                 
2000-01-18     NaN     NaN
2000-01-19    2.75    2.93
2000-01-20   12.75   10.13
2000-01-21   -7.25   -3.31
2000-01-24   -1.50   -5.07
2000-01-25    0.37   -2.75
2000-01-26    1.07    7.38
2000-01-27    1.19   -2.75
2000-01-28    2.13   -6.38
2000-01-31   -7.00   -6.12

h = df['h_diff'].gt(0)
#h = np.sign(df['h_diff'])
df['monotonic_h']=h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1).where(h,0)
print(df)
            h_diff  l_diff  monotonic_h
timestamp                             
2000-01-18     NaN     NaN            0
2000-01-19    2.75    2.93            1
2000-01-20   12.75   10.13            2
2000-01-21   -7.25   -3.31            0
2000-01-24   -1.50   -5.07            0
2000-01-25    0.37   -2.75            1
2000-01-26    1.07    7.38            2
2000-01-27    1.19   -2.75            3
2000-01-28    2.13   -6.38            4
2000-01-31   -7.00   -6.12            0

df['monotonic_h'].max()
#4

细节

h.ne(h.shift()).cumsum()

timestamp
2000-01-18    1
2000-01-19    2
2000-01-20    2
2000-01-21    3
2000-01-24    3
2000-01-25    4
2000-01-26    4
2000-01-27    4
2000-01-28    4
2000-01-31    5
Name: h_diff, dtype: int64

更新

df = df.join( h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1)
               .to_frame('values')
               .assign(monotic = np.where(h,'monotic_h_greater_0',
                                          'monotic_h_not_greater_0'),
                       index = lambda x: x.index)
               .where(df['h_diff'].notna())
               .pivot_table(columns = 'monotic',
                            index = 'index',
                            values = 'values',
                            fill_value=0) )

print(df)
            h_diff  l_diff  monotic_h_greater_0  monotic_h_not_greater_0
timestamp                                                               
2000-01-18     NaN     NaN                  NaN                      NaN
2000-01-19    2.75    2.93                  1.0                      0.0
2000-01-20   12.75   10.13                  2.0                      0.0
2000-01-21   -7.25   -3.31                  0.0                      1.0
2000-01-24   -1.50   -5.07                  0.0                      2.0
2000-01-25    0.37   -2.75                  1.0                      0.0
2000-01-26    1.07    7.38                  2.0                      0.0
2000-01-27    1.19   -2.75                  3.0                      0.0
2000-01-28    2.13   -6.38                  4.0                      0.0
2000-01-31   -7.00   -6.12                  0.0                      1.0

如何在pandas DataFrame值列中找到顺序（数字|增加|减少）的最高计数

问题描述

2 个解决方案

解决方案1
1 2020-01-17 09:25:36

解决方案2
1 已采纳 2020-01-17 10:33:41

如何在pandas DataFrame值列中找到顺序（数字|增加|减少）的最高计数

问题描述

2 个解决方案

解决方案1 1 2020-01-17 09:25:36

解决方案2 1 已采纳 2020-01-17 10:33:41

解决方案1
1 2020-01-17 09:25:36

解决方案2
1 已采纳 2020-01-17 10:33:41