简体   繁体   中英

Drop specific rows in pandas from a numpy array

I have a dataframe a thousands of rows long that looks like this:

ID  Email Address
1   ...    ... 
2   ...    ... 
3   ...    ... 
4   ...    ... 
1   ...    ... 
2   ...    ... 
5   ...    ... 
5   ...    ... 
6   ...    ... 

what I want to do is drop duplicates of ID so there is only one ID per person. I can't use drop_duplicates() because most people don't have ID's and this drops them too (not good!)

Is there a way to remove specific rows and only keep one instance of the IDs.

I have a dataframe of all the duplicate ID I want to remove if that helps. eg for the example I gave above:

ID  Email  Address
1   ...    ...
2   ...    ...
5   ...    ...

Maybe there's a way to turn this to a series/array of IDs and remove from the df that way?

I believe you need chain 2 conditions - duplicated with keep=False for all dupes with no parameter for first dupes:

df = df[df.duplicated(subset='ID', keep=False) & df.duplicated(subset='ID')]
print (df)
   ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

Is this what you want?

df[df.duplicated(subset='ID')]

    ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM