[英]Count number of unique rows pandas
I want to count the number of unique rows in a pandas dataframe and add a new row as count_index as in example.我想计算 pandas dataframe 中的唯一行数,并添加一个新行作为 count_index ,如示例所示。 In another way, I want to duplicate the index for duplicate rows.
以另一种方式,我想复制重复行的索引。
import pandas as pd
df = {'A': [ 8,8,9,9,9,12,12,13,15,15,15],
'B': [ 1,1,2,2,2,11,11,3,4,4,4],
'C': [ 10,10,20,20,20,101,101,30,40,40,40],
'D': [81,81,92,92,92,121,121,134,150,150,150]}
df = pd.DataFrame(df)
print(df.groupby(['A','B','C','D']).size())
#####################################################
#input
A B C D
8 1 10 81
8 1 10 81
9 2 20 92
9 2 20 92
9 2 20 92
12 11 101 121
12 11 101 121
13 3 30 134
15 4 40 150
15 4 40 150
15 4 40 150
####################################################
#expected output
A B C D Count_index
8 1 10 81 1
8 1 10 81 1
9 2 20 92 2
9 2 20 92 2
9 2 20 92 2
12 11 101 121 3
12 11 101 121 3
13 3 30 134 4
15 4 40 150 5
15 4 40 150 5
15 4 40 150 5
You can do this by counting the number of inverted .duplicated
s.您可以通过计算倒置的
.duplicated
的数量来做到这一点。 We can then use a cumulative to keep an ongoing count of the number of encountered unique rows.然后,我们可以使用累积来保持对遇到的唯一行数的持续计数。
df['count_index'] = (~df.duplicated(keep="first")).cumsum()
print(df)
A B C D count_index
0 8 1 10 81 1
1 8 1 10 81 1
2 9 2 20 92 2
3 9 2 20 92 2
4 9 2 20 92 2
5 12 11 101 121 3
6 12 11 101 121 3
7 13 3 30 134 4
8 15 4 40 150 5
9 15 4 40 150 5
10 15 4 40 150 5
You can use a combination of diff().ne(0)
or df.ne(df.shift())
您可以使用
diff().ne(0)
或df.ne(df.shift())
的组合
df.diff().ne(0).all(axis=1).cumsum()
or或者
df.ne(df.shift()).all(axis=1).cumsum()
Output: Output:
0 1
1 1
2 2
3 2
4 2
5 3
6 3
7 4
8 5
9 5
10 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.