简体   繁体   中英

Counting number of contiguous column values in pandas df

I have a df with a column like so:

col1
1
1
1
2
2
2
2
1
1
1
1

I want to count the number of contiguous occurrences of each value in col1 above some threshold. So, if the threshold is 0, the output should resemble:

1: 2
2: 1

If the threshold is 3, the output should resemble:

1: 1
2: 1

I know that looping over the column values and just tracking contiguous chains will work, but I'm wondering if there is a pandas way to do this that might be faster?

Here is one way use diff with cumsum create the additional key

s=df.groupby([df.col1,df.col1.diff().ne(0).cumsum()]).size()
s
Out[198]: 
col1  col1
1     1       3
      3       4
2     2       4
dtype: int64

thresh=3
s[s>thresh].count(level=0)
Out[201]: 
col1
1    1
2    1
dtype: int64

From here

df.col1.diff().ne(0).cumsum() # we bring the continue value into one value 
Out[202]: 
0     1
1     1
2     1
3     2
4     2
5     2
6     2
7     3
8     3
9     3
10    3
Name: col1, dtype: int32

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM