简体   繁体   English

KeyError:使用 drop_duplicates 时的 Int64Index([1], dtype='int64')

[英]KeyError: Int64Index([1], dtype='int64') when using drop_duplicates

I wrote a simple script that supposed to merge (union) some dataframes and remove the duplicates.我写了一个简单的脚本,它应该合并(联合)一些数据框并删除重复项。

For example, For the input:例如,对于输入:

df_A:
a  1
b  2

df_B:
b  2
c  3

The expected output would be:预期的 output 将是:

df_out:
a  1
b  2
c  3

I wrote the following code:我写了以下代码:

def read_dataframes(filenames, basedir):
    return [pd.read_csv(basedir + file, sep='\t', header=None, quoting=csv.QUOTE_NONE) for file in filenames]


def merge_dataframes(dfs, out):
    merged = pd.concat(dfs).drop_duplicates(subset=[0, 1]).reset_index(drop=True)
    merged = merged.iloc[:, [0, 1, 2, 7, 8, 9]]
    merged.to_csv(out, header=None, index=None, sep='\t')

and I am calling these functions in the following manner:我以下列方式调用这些函数:

merge_dataframes(read_dataframes(filenames, basedir), output)

I am getting an exception of KeyError :我遇到了KeyError异常:

Traceback (most recent call last):
  File "analysis_and_visualization.py", line 70, in <module>
    merge_dataframes(read_dataframes(wild_emb, wild_basedir), 'wild_emb_merged')
  File "analysis_and_visualization.py", line 17, in merge_dataframes
    merged = pd.concat(dfs).drop_duplicates(subset=[0, 1]).reset_index(drop=True)
  File "/Data/user/eliran/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5112, in drop_duplicates
    duplicated = self.duplicated(subset, keep=keep)
  File "/Data/user/eliran/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5248, in duplicated
    raise KeyError(diff)
KeyError: Int64Index([1], dtype='int64')

What am I doing wrong?我究竟做错了什么?

Going over the source code in frame.py and the function duplicated查看 frame.py 中的源代码frame.py duplicated

it seems that all the columns in your dataframe don't exist.您的 dataframe 中的所有列似乎都不存在。

Class DataFrame()

def duplicated(self) - snippet def duplicated(self) - 片段

  # Verify all columns in subset exist in the queried dataframe
        # Otherwise, raise a KeyError, same as if you try to __getitem__ with a
        # key that doesn't exist.
        diff = Index(subset).difference(self.columns)
        if not diff.empty:
            raise KeyError(diff)

df = pd.DataFrame({'col1' : [0,1,2], 'col3' : [1,2,3]})

print(df)

  col1  col3
0     0     1
1     1     2
2     2     3


df.drop_duplicates(subset=['col1','col2'])

   5246         diff = Index(subset).difference(self.columns)
   5247         if not diff.empty:
-> 5248             raise KeyError(diff)
   5249 
   5250         vals = (col.values for name, col in self.items() if name in subset)

KeyError: Index(['col2'], dtype='object')

I think problem here is not column 1 , because first column is converted to index , so some or all DataFrames has only one column called 0 .我认为这里的问题不是 column 1 ,因为第一列被转换为index ,所以一些或所有 DataFrames 只有一个名为0的列。

For prevent it use index_col=False parameter in read_csv :为了防止它在read_csv中使用index_col=False参数:

def read_dataframes(filenames, basedir):
    return [pd.read_csv(basedir + file, sep='\t', header=None, quoting=csv.QUOTE_NONE, index_col=False) for file in filenames]

Another problem should be for some reason there is only one column of data, so second column called 2 not exist.另一个问题应该是由于某种原因只有一列数据,所以第二列称为2不存在。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 KeyError: &quot;[Int64Index([ 12313,\\n, 34534],\\n dtype=&#39;int64&#39;, leng - KeyError: "None of [Int64Index([ 12313,\n , 34534],\n dtype='int64', leng 读取 CSV &amp; Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中” - Reading CSV & Columns - KeyError: “None of [Int64Index([0, 1, 2, 3], dtype='int64')] are in the [columns]” KeyError:“[Int64Index dtype=&#39;int64&#39;, length=9313)] 都不在 [columns]” - KeyError: "None of [Int64Index dtype='int64', length=9313)] are in the [columns]" Receiving KeyError: “[Int64Index([ ... dtype=&#39;int64&#39;, length=1323)] 都不在 [columns]” - Receiving KeyError: "None of [Int64Index([ ... dtype='int64', length=1323)] are in the [columns]" Python Mlens Ensemble:KeyError:“[Int64Index([... dtype='int64', length=105)] 均不在 [columns] 中” - Python Mlens Ensemble: KeyError: "None of [Int64Index([... dtype='int64', length=105)] are in the [columns]" 将Int64Index更改为Index,将dtype = int64更改为dtype = object - Change Int64Index to Index and dtype=int64 to dtype=object TypeError:无法使用 Int64Index 类型的这些索引器 [Int64Index([5], dtype='int64')] 对 Int64Index 进行位置索引 - TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype='int64')] of type Int64Index [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index] - None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index] 关键错误:[Int64Index([…]dtype='int64')] 均不在 [columns] 中 - Key Error: None of [Int64Index([…]dtype='int64')] are in the [columns] 将列表中的多个 Int64Index([], dtype='int64') 转换为列表 - Convert multiple Int64Index([], dtype='int64') in a list to a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM