如何根据 Pandas 数据帧中的两个或多个子集标准删除重复项

Question

可以说这是我的数据框

df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
                'center' : ['one', 'one', 'two', 'three'],
                'outcome' : ['f','t','f','f'] })

看起来像这样...

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

我想删除第 1 行，因为它与第 0 行具有相同的生物和中心。我想保留第 2 行，因为它与第 0 行具有相同的生物但不同的中心。

像这样的东西不会基于 drop_duplicates 输入结构工作，但这是我想要做的

df.drop_duplicates(subset = 'bio' & subset = 'center' )

有什么建议么？

编辑：改变 df 以适应正确答案的例子

Answer 1

你的语法是错误的。 这是正确的方法：

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

或者在这种特定情况下，只需简单地：

df.drop_duplicates()

两者都返回以下内容：

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

查看df.drop_duplicates 文档了解语法细节。 subset应该是一系列列标签。

Answer 2

上一个答案非常有帮助。 它帮助了我。 我还需要在代码中添加一些东西来获得我想要的东西。 所以，我想在这里补充一下。

数据框：

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

实施drop_duplicates后：

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

注意索引。 他们搞砸了。 如果有人想从0, 2, 3支持正常索引，即0, 1, 2 ：

df.drop_duplicates(subset=['bio', 'center', 'outcome'], ignore_index=True)

Output：

  bio center outcome
0   1    one       f
1   1    two       f
2   4  three       f

如何根据 Pandas 数据帧中的两个或多个子集标准删除重复项

问题描述

2 个解决方案

解决方案1
12 已采纳 2017-08-04 03:40:16

解决方案2
0 2022-08-11 10:44:26

如何根据 Pandas 数据帧中的两个或多个子集标准删除重复项

问题描述

2 个解决方案

解决方案1 12 已采纳 2017-08-04 03:40:16

解决方案2 0 2022-08-11 10:44:26

解决方案1
12 已采纳 2017-08-04 03:40:16

解决方案2
0 2022-08-11 10:44:26