KeyError：“[Int64Index dtype='int64', length=9313)] 都不在 [columns]”

Question

有一個 323 列和 10348 行的數據框。 我想使用以下代碼使用分層 k-Fold 划分它

df= pd.read_csv("path")
 x=df.loc[:, ~df.columns.isin(['flag'])]
 y= df['flag']
StratifiedKFold(n_splits=5, random_state=None, shuffle=False)
for train_index, test_index in skf.split(x, y):
       print("TRAIN:", train_index, "TEST:", test_index)
       x_train, x_test = x[train_index], x[test_index]
       y_train, y_test = y[train_index], y[test_index]

但我收到以下錯誤

KeyError: "None of [Int64Index([    0,     1,     2,     3,     4,     5,     6,     7,     8,\n               10,\n            ...\n            10338, 10339, 10340, 10341, 10342, 10343, 10344, 10345, 10346,\n            10347],\n           dtype='int64', length=9313)] are in the [columns]"

任何人告訴我為什么我會收到這個錯誤以及如何解決它

Answer 1

似乎您有數據幀切片問題，而不是 StratifiedKFold 本身有問題。 我為此目的制作了一個 df 並使用iloc在此處對索引數組進行切片來解決它：

from sklearn import model_selection

# The list of some column names in flag
flag = ["raw_sentence", "score"]
x=df.loc[:, ~df.columns.isin(flag)].copy()
y= df[flag].copy()
skf =model_selection.StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
for train_index, test_index in skf.split(x, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    x_train, x_test = x.iloc[list(train_index)], x.iloc[list(test_index)]

而且 train_indexes 和 test_indexes 是 nd-arrays 有點混亂這里的工作，我將它們轉換為列表。

你可以參考： https : //pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

Answer 2

你也可以使用df.take(indices_list,axis=0)

x_train, x_test = x.take(list(train_index),axis=0), x.take(list(test_index),axis=0)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.take.html

Answer 3

嘗試將 Pandas 數據框更改為 numpy 數組，如下所示：

pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()

array([[1, 3],
       [2, 4]])

KeyError：“[Int64Index dtype='int64', length=9313)] 都不在 [columns]”

問題描述

3 個解決方案

解決方案1
3 已采納 2020-02-07 06:15:16

解決方案2
1 2021-01-26 08:53:10

解決方案3
0 2020-02-07 03:06:43

KeyError：“[Int64Index dtype=&#39;int64&#39;, length=9313)] 都不在 [columns]”

問題描述

3 個解決方案

解決方案1 3 已采納 2020-02-07 06:15:16

解決方案2 1 2021-01-26 08:53:10

解決方案3 0 2020-02-07 03:06:43

KeyError：“[Int64Index dtype='int64', length=9313)] 都不在 [columns]”

解決方案1
3 已采納 2020-02-07 06:15:16

解決方案2
1 2021-01-26 08:53:10

解決方案3
0 2020-02-07 03:06:43