计算 pandas df 中连续列值的数量

Question

I have a df with a column like so:我有一个像这样的列的df：

I want to count the number of contiguous occurrences of each value in col1 above some threshold.我想计算 col1 中每个值在某个阈值以上的连续出现次数。 So, if the threshold is 0, the output should resemble:因此，如果阈值为 0，则 output 应类似于：

1: 2
2: 1

If the threshold is 3, the output should resemble:如果阈值为 3，则 output 应类似于：

1: 1
2: 1

I know that looping over the column values and just tracking contiguous chains will work, but I'm wondering if there is a pandas way to do this that might be faster?我知道循环列值并仅跟踪连续链会起作用，但我想知道是否有 pandas 方法可以更快地做到这一点？

Answer 1

Here is one way use diff with cumsum create the additional key这是使用diff和cumsum创建附加密钥的一种方法

s=df.groupby([df.col1,df.col1.diff().ne(0).cumsum()]).size()
s
Out[198]: 
col1  col1
1     1       3
      3       4
2     2       4
dtype: int64

thresh=3
s[s>thresh].count(level=0)
Out[201]: 
col1
1    1
2    1
dtype: int64

From here从这里

df.col1.diff().ne(0).cumsum() # we bring the continue value into one value 
Out[202]: 
0     1
1     1
2     1
3     2
4     2
5     2
6     2
7     3
8     3
9     3
10    3
Name: col1, dtype: int32

计算 pandas df 中连续列值的数量

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-07-30 03:22:40

计算 pandas df 中连续列值的数量

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-07-30 03:22:40

解决方案1
3 已采纳 2020-07-30 03:22:40