简体   繁体   English

如何使用 python/pandas 计算列中相同的顺序值的数量?

[英]How to count number of identical, sequential values in a column with python/pandas?

Say I have a dataframe such as:假设我有一个 dataframe 例如:

df = pd.DataFrame({'A': [1, 1, 2, 3, 3, 3, 1, 1]})

I'd like to count the number of time the current column value has been seen in a row previous.我想计算在前一行中看到当前列值的次数。 For the above example, the output would be:对于上述示例,output 将是:

[1, 2, 1, 1, 2, 3, 1, 2]

I know how to group by and cumulative sum all repeating values, but I don't know how to get it to restart at each new value.我知道如何对所有重复值进行分组和累积总和,但我不知道如何让它在每个新值处重新启动。

ie IE

df['A'].groupby(df['A']).cumcount() 
# returns [0, 1, 0, 0, 1, 2, 2, 3] which is not what I want.

Try this method:试试这个方法:

df.groupby((df['A'] != df['A'].shift()).cumsum()).cumcount() + 1

Output: Output:

0    1
1    2
2    1
3    1
4    2
5    3
6    1
7    2
dtype: int64

Details细节

Use equality to check between current row and next row, then cumsum to create a new group for each changing in 'A', then groupby and cumcount adding 1 to start at 1 instead of zero.使用相等来检查当前行和下一行之间,然后cumsum为“A”中的每个更改创建一个新组,然后groupbycumcount加 1 以从 1 开始而不是 0。

Break down into steps分解为步骤

Broken up in steps so you can see the progression in the dataframe columns.分步分解,以便您可以看到 dataframe 列中的进展。

df['grp'] = df['A'] != df['A'].shift() 
#for numbers you can use df['A'].diff().ne(0) 
#however using inquality check is more versatile for strings
df['cumgroup'] = df['grp'].cumsum()
df['count'] = df.groupby('cumgroup').cumcount() + 1
df

Output: Output:

   A    grp  cumgroup  count
0  1   True         1      1
1  1  False         1      2
2  2   True         2      1
3  3   True         3      1
4  3  False         3      2
5  3  False         3      3
6  1   True         4      1
7  1  False         4      2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM