Replace duplicated values across different columns in pandas with '0'

Question

I have pandas dataframe like this (10 columns and thousands of rows):

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     a    d    b    f    a    c    g     b
     b    g     i    k    l    r    k    b    m     l

It has duplicates. i want to replace duplicates with (0) across the rows and keep only the first occurrence. So, the result would be like this:

    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
     a    c     0    d    b    f    0    0    g     0
     0    0     i    k    l    r    0    0    m     0

The values in the dataframe have type " object ". I am using python 3.5.

I found a solution in Replace duplicate values across columns in Pandas but it didn't work for me. No replacements happened.

Any other ideas to solve this issue?

Answer 1

IIUC, you can stack to get a series, check that series for duplicated , unstack the series and mask:

df.mask(df.stack().duplicated().unstack(),0)

output:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    0    0    i    k    l    r    0    0    m     0

Answer 2

Why didn't this work?

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, 0))

Works fine for me:

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0    a    c    0    d    b    f    0    0    g     0
1    b    g    i    k    l    r    0    0    m     0

Replace duplicated values across different columns in pandas with '0'

Question

2 answers

solution1
3 2020-04-27 19:57:59

solution2
0 2020-04-27 20:01:07

Replace duplicated values across different columns in pandas with '0'

Question

2 answers

solution1 3 2020-04-27 19:57:59

solution2 0 2020-04-27 20:01:07

solution1
3 2020-04-27 19:57:59

solution2
0 2020-04-27 20:01:07