简体   繁体   English

计算 pandas df 中连续列值的数量

[英]Counting number of contiguous column values in pandas df

I have a df with a column like so:我有一个像这样的列的df:

col1
1
1
1
2
2
2
2
1
1
1
1

I want to count the number of contiguous occurrences of each value in col1 above some threshold.我想计算 col1 中每个值在某个阈值以上的连续出现次数。 So, if the threshold is 0, the output should resemble:因此,如果阈值为 0,则 output 应类似于:

1: 2
2: 1

If the threshold is 3, the output should resemble:如果阈值为 3,则 output 应类似于:

1: 1
2: 1

I know that looping over the column values and just tracking contiguous chains will work, but I'm wondering if there is a pandas way to do this that might be faster?我知道循环列值并仅跟踪连续链会起作用,但我想知道是否有 pandas 方法可以更快地做到这一点?

Here is one way use diff with cumsum create the additional key这是使用diffcumsum创建附加密钥的一种方法

s=df.groupby([df.col1,df.col1.diff().ne(0).cumsum()]).size()
s
Out[198]: 
col1  col1
1     1       3
      3       4
2     2       4
dtype: int64

thresh=3
s[s>thresh].count(level=0)
Out[201]: 
col1
1    1
2    1
dtype: int64

From here从这里

df.col1.diff().ne(0).cumsum() # we bring the continue value into one value 
Out[202]: 
0     1
1     1
2     1
3     2
4     2
5     2
6     2
7     3
8     3
9     3
10    3
Name: col1, dtype: int32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM