Drop specific rows in pandas from a numpy array

Question

I have a dataframe a thousands of rows long that looks like this:

ID  Email Address
1   ...    ... 
2   ...    ... 
3   ...    ... 
4   ...    ... 
1   ...    ... 
2   ...    ... 
5   ...    ... 
5   ...    ... 
6   ...    ...

what I want to do is drop duplicates of ID so there is only one ID per person. I can't use drop_duplicates() because most people don't have ID's and this drops them too (not good!)

Is there a way to remove specific rows and only keep one instance of the IDs.

I have a dataframe of all the duplicate ID I want to remove if that helps. eg for the example I gave above:

ID  Email  Address
1   ...    ...
2   ...    ...
5   ...    ...

Maybe there's a way to turn this to a series/array of IDs and remove from the df that way?

Answer 1

I believe you need chain 2 conditions - duplicated with keep=False for all dupes with no parameter for first dupes:

df = df[df.duplicated(subset='ID', keep=False) & df.duplicated(subset='ID')]
print (df)
   ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

Answer 2

Is this what you want?

df[df.duplicated(subset='ID')]

    ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

Drop specific rows in pandas from a numpy array

Question

2 answers

solution1
1 ACCPTED 2018-12-21 11:34:48

solution2
1 2018-12-21 11:38:55

Drop specific rows in pandas from a numpy array

Question

2 answers

solution1 1 ACCPTED 2018-12-21 11:34:48

solution2 1 2018-12-21 11:38:55

solution1
1 ACCPTED 2018-12-21 11:34:48

solution2
1 2018-12-21 11:38:55