如何只保留pandas DataFrame中具有多个值的行？

Question

I often try to do the following operation, but there's an immediate solution which is most efficient in pandas: 我经常尝试进行以下操作，但有一个在熊猫中效率最高的即时解决方案：

I have the following example pandas DataFrame, whereby there are two columns, Name and Age : 我有以下示例pandas DataFrame，其中有两列， Name和Age ：

import pandas as pd

data = [['Alex',10],['Bob',12],['Barbara',25], ['Bob',72], ['Clarke',13], ['Clarke',13], ['Destiny', 45]]

df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)

print(df)
      Name   Age
0     Alex  10.0
1      Bob  12.0
2  Barbara  25.0
3      Bob  72.0
4   Clarke  13.0
5   Clarke  13.0
6  Destiny  45.0

I would like to remove all rows which do have a matching value in Name . 我想删除Name具有匹配值的所有行。 In the example df , there are two Bob values and two Clarke values. 在示例df ，有两个Bob值和两个Clarke值。 The intended output would therefore be: 因此，预期的输出是：

      Name   Age
0      Bob  12.0
1      Bob  72.0
2   Clarke  13.0
3   Clarke  13.0

whereby I'm assuming that there's a reset index. 我假设有一个重置索引。

One option would be to keep all unique values for Name in a list, and then iterate through the dataframe to check for duplicate rows. 一种选择是在列表中保留Name所有唯一值，然后遍历数据帧以检查重复行。 That would be very inefficient. 那将是非常低效的。

Is there a built-in function for this task? 这个任务有内置函数吗？

Answer 1

Use drop_duplicates , and only get the ones that are dropped: 使用drop_duplicates ，只获取被删除的内容：

print(df[~df['Name'].isin(df['Name'].drop_duplicates(False))])

Output: 输出：

     Name   Age
1     Bob  12.0
3     Bob  72.0
4  Clarke  13.0
5  Clarke  13.0

If care about the index, do: 如果关心索引，请执行以下操作：

print(df[~df['Name'].isin(df['Name'].drop_duplicates(False))].reset_index(drop=1))

Output: 输出：

     Name   Age
0     Bob  12.0
1     Bob  72.0
2  Clarke  13.0
3  Clarke  13.0

Answer 2

Using duplicated 使用duplicated

df[df.Name.duplicated(keep=False)]
     Name   Age
1     Bob  12.0
3     Bob  72.0
4  Clarke  13.0
5  Clarke  13.0

如何只保留pandas DataFrame中具有多个值的行？

问题描述

2 个解决方案

解决方案1
3 2018-12-12 01:22:54

解决方案2
3 已采纳 2018-12-12 02:05:12

如何只保留pandas DataFrame中具有多个值的行？

问题描述

2 个解决方案

解决方案1 3 2018-12-12 01:22:54

解决方案2 3 已采纳 2018-12-12 02:05:12

解决方案1
3 2018-12-12 01:22:54

解决方案2
3 已采纳 2018-12-12 02:05:12