[英]Counting number of contiguous column values in pandas df
I have a df with a column like so:我有一个像这样的列的df:
col1
1
1
1
2
2
2
2
1
1
1
1
I want to count the number of contiguous occurrences of each value in col1 above some threshold.我想计算 col1 中每个值在某个阈值以上的连续出现次数。 So, if the threshold is 0, the output should resemble:因此,如果阈值为 0,则 output 应类似于:
1: 2
2: 1
If the threshold is 3, the output should resemble:如果阈值为 3,则 output 应类似于:
1: 1
2: 1
I know that looping over the column values and just tracking contiguous chains will work, but I'm wondering if there is a pandas way to do this that might be faster?我知道循环列值并仅跟踪连续链会起作用,但我想知道是否有 pandas 方法可以更快地做到这一点?
Here is one way use diff
with cumsum
create the additional key这是使用diff
和cumsum
创建附加密钥的一种方法
s=df.groupby([df.col1,df.col1.diff().ne(0).cumsum()]).size()
s
Out[198]:
col1 col1
1 1 3
3 4
2 2 4
dtype: int64
thresh=3
s[s>thresh].count(level=0)
Out[201]:
col1
1 1
2 1
dtype: int64
From here从这里
df.col1.diff().ne(0).cumsum() # we bring the continue value into one value
Out[202]:
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 3
8 3
9 3
10 3
Name: col1, dtype: int32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.