在pandas数据框中选择唯一的观察结果

Question

I have a pandas data frame with a column uniqueid . 我有一个带有uniqueid列的pandas数据框。 I would like to remove all duplicates from the data frame based on this column, such that all remaining observations are unique. 我想基于此列从数据框中删除所有重复项，以便所有剩余的观察结果都是唯一的。

Answer 1

There is also the drop_duplicates() method for any data frame ( docs here ). 任何数据框drop_duplicates()方法（此处为docs ）。 You can pass specific columns to drop from as an argument. 您可以传递特定列作为参数。

df.drop_duplicates(subset='uniqueid', inplace=True)

Answer 2

Use the duplicated method 使用duplicated方法

Since we only care if uniqueid ( A in my example) is duplicated, select that and call duplicated on that series. 因为我们只关心uniqueid （我的例子中的A ）是否重复，所以选择它并在该系列上调用duplicated 。 Then use the ~ to flip the bools. 然后使用~来翻转bool。

In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]})

In [91]: df
Out[91]: 
   A  B
0  a  1
1  b  2
2  b  3
3  c  4

In [92]: df['A'].duplicated()
Out[92]: 
0    False
1    False
2     True
3    False
Name: A, dtype: bool

In [93]: df.loc[~df['A'].duplicated()]
Out[93]: 
   A  B
0  a  1
1  b  2
3  c  4

在pandas数据框中选择唯一的观察结果

问题描述

2 个解决方案

解决方案1
12 已采纳 2013-11-01 04:13:27

解决方案2
9 2013-11-01 01:35:05

在pandas数据框中选择唯一的观察结果

问题描述

2 个解决方案

解决方案1 12 已采纳 2013-11-01 04:13:27

解决方案2 9 2013-11-01 01:35:05

解决方案1
12 已采纳 2013-11-01 04:13:27

解决方案2
9 2013-11-01 01:35:05