简体   繁体   English

如何只保留pandas DataFrame中具有多个值的行?

[英]How to only keep rows which have more than one value in a pandas DataFrame?

I often try to do the following operation, but there's an immediate solution which is most efficient in pandas: 我经常尝试进行以下操作,但有一个在熊猫中效率最高的即时解决方案:

I have the following example pandas DataFrame, whereby there are two columns, Name and Age : 我有以下示例pandas DataFrame,其中有两列, NameAge

import pandas as pd

data = [['Alex',10],['Bob',12],['Barbara',25], ['Bob',72], ['Clarke',13], ['Clarke',13], ['Destiny', 45]]

df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)

print(df)
      Name   Age
0     Alex  10.0
1      Bob  12.0
2  Barbara  25.0
3      Bob  72.0
4   Clarke  13.0
5   Clarke  13.0
6  Destiny  45.0

I would like to remove all rows which do have a matching value in Name . 我想删除Name具有匹配值的所有行。 In the example df , there are two Bob values and two Clarke values. 在示例df ,有两个Bob值和两个Clarke值。 The intended output would therefore be: 因此,预期的输出是:

      Name   Age
0      Bob  12.0
1      Bob  72.0
2   Clarke  13.0
3   Clarke  13.0

whereby I'm assuming that there's a reset index. 我假设有一个重置索引。

One option would be to keep all unique values for Name in a list, and then iterate through the dataframe to check for duplicate rows. 一种选择是在列表中保留Name所有唯一值,然后遍历数据帧以检查重复行。 That would be very inefficient. 那将是非常低效的。

Is there a built-in function for this task? 这个任务有内置函数吗?

Use drop_duplicates , and only get the ones that are dropped: 使用drop_duplicates ,只获取被删除的内容:

print(df[~df['Name'].isin(df['Name'].drop_duplicates(False))])

Output: 输出:

     Name   Age
1     Bob  12.0
3     Bob  72.0
4  Clarke  13.0
5  Clarke  13.0

If care about the index, do: 如果关心索引,请执行以下操作:

print(df[~df['Name'].isin(df['Name'].drop_duplicates(False))].reset_index(drop=1))

Output: 输出:

     Name   Age
0     Bob  12.0
1     Bob  72.0
2  Clarke  13.0
3  Clarke  13.0

Using duplicated 使用duplicated

df[df.Name.duplicated(keep=False)]
     Name   Age
1     Bob  12.0
3     Bob  72.0
4  Clarke  13.0
5  Clarke  13.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建一个 dataframe 只选择在 Pandas 中值超过 avg +/* 标准偏差的行? - How to create a dataframe that only selects rows that have value more than avg +/* standard deviation in Pandas? 如何在 select 行中有一个以上的值 Pandas DataFrame - How to select rows with more than one value in Pandas DataFrame 如何在熊猫数据框中找到与另一列中的多个值相对应的列中具有值的所有行? - How can I find all rows with a value in one column which corresponds to more than one value in another column in a pandas dataframe? 在熊猫数据框中的一个列中有多个值时如何计算值计数 - how to calculate value counts when we have more than one value in a colum in pandas dataframe 选择只有一个唯一值的 Pandas 数据框列 - Select pandas dataframe columns which have only one unique value 在 pandas dataframe 列中仅保留重复四次以上的行 - Keep only rows repeated more than four times in a pandas dataframe column 数据帧pandas中使用逗号的多个值 - more than one value with comma in dataframe pandas 如何在 Pandas 的单元格中删除具有多个值的行 - How to remove rows with more than one value in a cell in Pandas 如何组合 pandas dataframe 中在一列中具有相同值的行 - How to combine rows in a pandas dataframe that have the same value in one column 返回超过N列具有相同值的Pandas数据框行 - Return Pandas dataframe rows where more than N columns have the same value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM