从numpy数组中删除熊猫中的特定行

Question

I have a dataframe a thousands of rows long that looks like this: 我有一个数千行的数据框，看起来像这样：

ID  Email Address
1   ...    ... 
2   ...    ... 
3   ...    ... 
4   ...    ... 
1   ...    ... 
2   ...    ... 
5   ...    ... 
5   ...    ... 
6   ...    ...

what I want to do is drop duplicates of ID so there is only one ID per person. 我想做的是删除ID的重复项，因此每人只有一个ID。 I can't use drop_duplicates() because most people don't have ID's and this drops them too (not good!) 我不能使用drop_duplicates（），因为大多数人没有ID，这也会删除它们（不好！）

Is there a way to remove specific rows and only keep one instance of the IDs. 有没有一种方法可以删除特定的行，而只保留一个ID实例。

I have a dataframe of all the duplicate ID I want to remove if that helps. 如果有帮助，我有一个要删除的所有重复ID的数据框。 eg for the example I gave above: 例如，对于我上面给出的示例：

ID  Email  Address
1   ...    ...
2   ...    ...
5   ...    ...

Maybe there's a way to turn this to a series/array of IDs and remove from the df that way? 也许有一种方法可以将其转换为ID系列/数组并以这种方式从df中删除？

Answer 1

I believe you need chain 2 conditions - duplicated with keep=False for all dupes with no parameter for first dupes: 我相信您需要链2条件-对于所有受骗者都使用keep=False进行duplicated ，而对于第一个受骗者则没有参数：

df = df[df.duplicated(subset='ID', keep=False) & df.duplicated(subset='ID')]
print (df)
   ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

Answer 2

Is this what you want? 这是你想要的吗？

df[df.duplicated(subset='ID')]

    ID Email Address
4   1   ...     ...
5   2   ...     ...
7   5   ...     ...

从numpy数组中删除熊猫中的特定行

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-12-21 11:34:48

解决方案2
1 2018-12-21 11:38:55

从numpy数组中删除熊猫中的特定行

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-12-21 11:34:48

解决方案2 1 2018-12-21 11:38:55

解决方案1
1 已采纳 2018-12-21 11:34:48

解决方案2
1 2018-12-21 11:38:55