![](/img/trans.png)
[英]pandas: how to select first or last by column in keep with drop_duplicates
[英]drop_duplicates - ValueError: keep must be either "first", "last" or False
我已經安裝了 Pandas 17.0。 我現在收到一個奇怪的錯誤
ValueError: keep must be either "first", "last" or False
當我嘗試這樣做時:
ids=ids.drop_duplicates('ID')
這在以前的 Pandas 版本中一直有效,代碼沒有改變。 BTW ids
是一個包含一列整數的數據框......
這是回溯:
Traceback (most recent call last):
File "<ipython-input-34-6e98a890591b>", line 1, in <module>
ids=ids.drop_duplicates('ID')
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
1164, in drop_duplicates
return super(Series, self).drop_duplicates(keep=keep, inplace=inplace)
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 576,
in drop_duplicates
duplicated = self.duplicated(keep=keep)
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
1169, in duplicated
return super(Series, self).duplicated(keep=keep)
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 603,
in duplicated
duplicated = lib.duplicated(keys, keep=keep)
File "pandas\lib.pyx", line 1383, in pandas.lib.duplicated
(pandas\lib.c:24490)
ValueError: keep must be either "first", "last" or False
注意keep=keep
? Pandas 17.0 中drop_duplicates
的默認值是keep='first'
。 所以如果我不指定它不應該默認為那個嗎? 為什么我會在這里出錯? Pandas 17.0 中的錯誤?
該錯誤表明ids
實際上是一個Series
,其第一個參數是keep
參數,如果ids
確實是一個 df 則不會發生此錯誤,因為drop_duplicates
第一個參數是subset
。
我嘗試了語法(使用keep
),以前是take_last
...
import pandas as pd
df = pd.DataFrame({'c1': ['cat'] * 3 + ['dog'] * 4,
'c2': [1, 1, 2, 3, 3, 4, 4]})
print(df)
print(df.drop_duplicates())
print(df.drop_duplicates(['c1', 'c2'],keep='first'))
print(df.drop_duplicates(['c1', 'c2'],keep='last'))
print(df.drop_duplicates(['c1', 'c2'],keep=False)) #drops all but one cat stays
默認情況下, drop_duplicates()
它是keep='first'
並且所有列都被計算drop_duplicates()
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.