How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Question

Lets say this is my data-frame

df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
                'center' : ['one', 'one', 'two', 'three'],
                'outcome' : ['f','t','f','f'] })

It looks like this...

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

I want to drop row 1 because it has the same bio & center as row 0. I want to keep row 2 because it has the same bio but different center then row 0.

Something like this won't work based on drop_duplicates input structure but it's what I am trying to do

df.drop_duplicates(subset = 'bio' & subset = 'center' )

Any suggestions?

edit: changed df a bit to fit example by correct answer

Answer 1

Your syntax is wrong. Here's the correct way:

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

Or in this specific case, just simply:

df.drop_duplicates()

Both return the following:

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column labels.

Answer 2

The previous Answer was very helpful. It helped me. I also needed to add something in code to get what I wanted. So, I wanted to add here that.

The data-frame:

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

After implementing drop_duplicates :

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

Notice at the index. They got messed up. If anyone wants to back the normal indexes ie 0, 1, 2 from 0, 2, 3 :

df.drop_duplicates(subset=['bio', 'center', 'outcome'], ignore_index=True)

Output:

  bio center outcome
0   1    one       f
1   1    two       f
2   4  three       f

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Question

2 answers

solution1
12 ACCPTED 2017-08-04 03:40:16

solution2
0 2022-08-11 10:44:26

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Question

2 answers

solution1 12 ACCPTED 2017-08-04 03:40:16

solution2 0 2022-08-11 10:44:26

solution1
12 ACCPTED 2017-08-04 03:40:16

solution2
0 2022-08-11 10:44:26