熊猫不会通过.drop_duplicates（）引发缺少列的KeyError错误

Question

Something just happened with Pandas which makes me trust it a bit less, does anyone know why it behaves like this? Pandas刚刚发生了一些事情，这使我对它的信任降低了，有人知道为什么它会这样吗？ Anyway, for this small example is easy to see, but for a larger dataframe, one would need to take care.. I almost made a mistake with something. 无论如何，对于这个小例子很容易看到，但对于较大的数据框，则需要小心一点。我几乎在某些方面犯了错误。

df = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,81,87], "C":[56,78,0,14,13], "D":[0,87,72,87,14], "E":[78,12,31,0,34]}) 
>> df

Then, if you look for a column which isn't there: 然后，如果您查找不存在的列：

df['b']
KeyError: 'b'

But - 但是-

df.drop_duplicates(['b', 'D'])

...runs without error, and finds the error in column D. ...运行无错误，并在D列中找到错误。

Actually, df.drop_duplicates(['D']) produces exactly the same result. 实际上， df.drop_duplicates(['D'])产生完全相同的结果。

It has missed one duplicate row however has also missed one in column B because it has been misspelled. 它错过了一个重复的行，但是也错过了B列中的一个，因为它拼写错误。 It doesn't warn you or raise an error. 它不会警告您或引发错误。

Using Pandas 0.22.0 and Python 3.6.4. 使用Pandas 0.22.0和Python 3.6.4。

df.drop_duplicates(['B','D']) just returns the original dataframe without dropping anything. df.drop_duplicates(['B','D'])仅返回原始数据帧，而不丢弃任何内容。 Am I missing something or is Pandas broken? 我是否想念东西或熊猫坏了？

Answer 1

Pandas version 0.20.3 python 3.6. Pandas版本0.20.3 python 3.6。

When I run this line of code: 当我运行以下代码行时：

df.drop_duplicates(['b', 'D'])

There is 有

KeyError: 'b' KeyError：“ b”

In your example is strange situation with row 4. 在您的示例中，第4行的情况很奇怪。

First 第一

df.loc[4,'B'] = 87

After drop duplicate: 删除重复后：

df.loc[4,'B'] = 82

It looks like you have some extra operation between this steps. 在这两个步骤之间，您似乎需要进行一些额外的操作。

熊猫不会通过.drop_duplicates（）引发缺少列的KeyError错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-16 10:49:51

熊猫不会通过.drop_duplicates（）引发缺少列的KeyError错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-16 10:49:51

解决方案1
1 已采纳 2018-02-16 10:49:51