简体   繁体   中英

Replace duplicated values across different columns in pandas with '0'

I have pandas dataframe like this (10 columns and thousands of rows):

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     a    d    b    f    a    c    g     b
     b    g     i    k    l    r    k    b    m     l

It has duplicates. i want to replace duplicates with (0) across the rows and keep only the first occurrence. So, the result would be like this:

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     0    d    b    f    0    0    g     0
     0    0     i    k    l    r    0    0    m     0

The values in the dataframe have type " object ". I am using python 3.5.

I found a solution in Replace duplicate values across columns in Pandas but it didn't work for me. No replacements happened.

Any other ideas to solve this issue?

IIUC, you can stack to get a series, check that series for duplicated , unstack the series and mask:

df.mask(df.stack().duplicated().unstack(),0)

output:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    0    0    i    k    l    r    0    0    m     0

Why didn't this work?

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, 0))

Works fine for me:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    b    g    i    k    l    r    0    0    m     0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM