简体   繁体   中英

Count duplicates in rows per column in pandas DataFrame

I have a very long table like below:

    A    B    C    D    .......
0   au   br   gt   uy
1   cd   gq   gt   uy
2   fg   br   gt   ml
3   kl   br   gt   wx

..............

I would like to count and to print duplicates per column like:

A   0    
B   2     
C   3     
D   1    

I have only found to count duplicates for one column:

df.duplicated(['B']).sum()

Do I have to write all columns (about 30) or is it possible to use something from pandas? I have tried this but it doesn't work:

df.duplicated(df.loc[:,:]).sum()

Subtract length of DataFrame with nunique :

df = len(df) - df.nunique()
print (df)
A    0
B    2
C    3
D    1
dtype: int64

Or use apply with duplicated for get boolean mask for each column separately and sum for count of True values:

df = df.apply(lambda x: x.duplicated()).sum()
print (df)
A    0
B    2
C    3
D    1
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM