How to only keep rows which have more than one value in a pandas DataFrame?

Question

I often try to do the following operation, but there's an immediate solution which is most efficient in pandas:

I have the following example pandas DataFrame, whereby there are two columns, Name and Age :

import pandas as pd

data = [['Alex',10],['Bob',12],['Barbara',25], ['Bob',72], ['Clarke',13], ['Clarke',13], ['Destiny', 45]]

df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)

print(df)
      Name   Age
0     Alex  10.0
1      Bob  12.0
2  Barbara  25.0
3      Bob  72.0
4   Clarke  13.0
5   Clarke  13.0
6  Destiny  45.0

I would like to remove all rows which do have a matching value in Name . In the example df , there are two Bob values and two Clarke values. The intended output would therefore be:

      Name   Age
0      Bob  12.0
1      Bob  72.0
2   Clarke  13.0
3   Clarke  13.0

whereby I'm assuming that there's a reset index.

One option would be to keep all unique values for Name in a list, and then iterate through the dataframe to check for duplicate rows. That would be very inefficient.

Is there a built-in function for this task?

Answer 1

Use drop_duplicates , and only get the ones that are dropped:

print(df[~df['Name'].isin(df['Name'].drop_duplicates(False))])

Output:

     Name   Age
1     Bob  12.0
3     Bob  72.0
4  Clarke  13.0
5  Clarke  13.0

If care about the index, do:

print(df[~df['Name'].isin(df['Name'].drop_duplicates(False))].reset_index(drop=1))

Output:

     Name   Age
0     Bob  12.0
1     Bob  72.0
2  Clarke  13.0
3  Clarke  13.0

Answer 2

Using duplicated

df[df.Name.duplicated(keep=False)]
     Name   Age
1     Bob  12.0
3     Bob  72.0
4  Clarke  13.0
5  Clarke  13.0

How to only keep rows which have more than one value in a pandas DataFrame?

Question

2 answers

solution1
3 2018-12-12 01:22:54

solution2
3 ACCPTED 2018-12-12 02:05:12

How to only keep rows which have more than one value in a pandas DataFrame?

Question

2 answers

solution1 3 2018-12-12 01:22:54

solution2 3 ACCPTED 2018-12-12 02:05:12

solution1
3 2018-12-12 01:22:54

solution2
3 ACCPTED 2018-12-12 02:05:12