I have a very long table like below:
A B C D .......
0 au br gt uy
1 cd gq gt uy
2 fg br gt ml
3 kl br gt wx
..............
I would like to count and to print duplicates per column like:
A 0
B 2
C 3
D 1
I have only found to count duplicates for one column:
df.duplicated(['B']).sum()
Do I have to write all columns (about 30) or is it possible to use something from pandas? I have tried this but it doesn't work:
df.duplicated(df.loc[:,:]).sum()
Subtract length of DataFrame with nunique
:
df = len(df) - df.nunique()
print (df)
A 0
B 2
C 3
D 1
dtype: int64
Or use apply
with duplicated
for get boolean mask for each column separately and sum
for count of True
values:
df = df.apply(lambda x: x.duplicated()).sum()
print (df)
A 0
B 2
C 3
D 1
dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.