KeyError: "[Int64Index([ 12313,\\n, 34534],\\n dtype='int64', leng

Question

官方指南

我想使用官方scikitlern最新示例代码StratifiedKFold

>>> import numpy as np
>>> from sklearn.model_selection import StratifiedKFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> skf = StratifiedKFold(n_splits=2)
>>> skf.get_n_splits(X, y)
2
>>> print(skf)
StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in skf.split(X, y):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 3] TEST: [0 2]
TRAIN: [0 2] TEST: [1 3]

我的代码

我将所有日期保存在 2 个熊猫数据框 X,y 中的整数和浮点值

skf = StratifiedKFold(n_splits=4) # shuffle=True, random_state=1

for train_index, test_index in skf.split(X, y):
    X_train = X[train_index]
    X_test = X[test_index]
    y_train = y[train_index]
    y_test = y[test_index]
    print("TRAIN:", train_index, "TEST:", test_index)

错误

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-2776afce57e9> in <module>
      2 
      3 for train_index, test_index in skf.split(X, y):
----> 4     X_train = X[train_index]
      5     X_test = X[test_index]
      6     y_train = y[train_index]

~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2906             if is_iterator(key):
   2907                 key = list(key)
-> 2908             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2909 
   2910         # take() does not accept boolean indexers

~/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Int64Index([ 785015,  785016,  785017,  785018,  785019,  785020,  785021,\n             785022,  785023,  785024,\n            ...\n            3140252, 3140253, 3140254, 3140255, 3140256, 3140257, 3140258,\n            3140259, 3140260, 3140261],\n           dtype='int64', length=2355196)] are in the [columns]"

我尝试过的解决方案

他在不同的地方有错误 - 关键错误：没有 [Int64Index...] dtype='int64] 在列中
没有答案，也没有错误消息 - KeyError: “[Int64Index([2, 3], dtype='int64')] 都在 [columns]”中
不同的代码，不同的网络，最后的数据存储 - 使用 sklearn 的 KFold 分离 Pandas 数据帧

Answer 1

在这篇文章中，他们以不同的方式回答了这个问题，但其中一条评论回答了我的问题。

Receiving KeyError: “[Int64Index([ ... dtype='int64', length=1323)] 都在 [columns] 中” @bubble
当您加载数据时，它必须是 Numpy 向量化的，而不是数据框。

X = mydataframe.drop(['acol','bcol'], axis=1).values 
y = mydataframe['targetvalue'].values

KeyError: "[Int64Index([ 12313,\\n, 34534],\\n dtype='int64', leng

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-02 09:23:11

KeyError: &quot;[Int64Index([ 12313,\\n, 34534],\\n dtype=&#39;int64&#39;, leng

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-02 09:23:11

KeyError: "[Int64Index([ 12313,\\n, 34534],\\n dtype='int64', leng

解决方案1
0 已采纳 2020-10-02 09:23:11