简体   繁体   中英

Pandas does not raise KeyError for missing column with .drop_duplicates()

Something just happened with Pandas which makes me trust it a bit less, does anyone know why it behaves like this? Anyway, for this small example is easy to see, but for a larger dataframe, one would need to take care.. I almost made a mistake with something.

df = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,81,87], "C":[56,78,0,14,13], "D":[0,87,72,87,14], "E":[78,12,31,0,34]}) 
>> df

在此处输入图片说明

Then, if you look for a column which isn't there:

df['b']
KeyError: 'b'

But -

df.drop_duplicates(['b', 'D'])

...runs without error, and finds the error in column D.

在此处输入图片说明

Actually, df.drop_duplicates(['D']) produces exactly the same result.

It has missed one duplicate row however has also missed one in column B because it has been misspelled. It doesn't warn you or raise an error.

Using Pandas 0.22.0 and Python 3.6.4.

df.drop_duplicates(['B','D']) just returns the original dataframe without dropping anything. Am I missing something or is Pandas broken?

Pandas version 0.20.3 python 3.6.

When I run this line of code:

df.drop_duplicates(['b', 'D'])

There is

KeyError: 'b'

In your example is strange situation with row 4.

First

df.loc[4,'B'] = 87

After drop duplicate:

df.loc[4,'B'] = 82

It looks like you have some extra operation between this steps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM