简体   繁体   English

Sklearn 错误:[Int64Index([2, 3], dtype='int64')] 均不在 [columns] 中

[英]Sklearn error: None of [Int64Index([2, 3], dtype='int64')] are in the [columns]

Could someone explain why this code:有人可以解释为什么这段代码:

from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
import numpy as np

#df = pd.read_csv('missing_data.csv',sep=',')

df = pd.DataFrame(np.array([[1, 2, 3,4,5,6,7,8,9,1],
                            [4, 5, 6,3,4,5,7,5,4,1],
                            [7, 8, 9,6,2,3,6,5,4,1],
                            [7, 8, 9,6,1,3,2,2,4,0],
                            [7, 8, 9,6,5,6,6,5,4,0]]),
                            columns=['a', 'b', 'c','d','e','f','g','h','i','j'])

X_train = df.iloc[:,:-1]
y_train = df.iloc[:,-1]


clf=SVC(kernel='linear')
kfold = StratifiedKFold(n_splits=2,random_state=42,shuffle=True)
for train_index,test_index in kfold.split(X_train,y_train):
    x_train_fold,x_test_fold = X_train[train_index],X_train[test_index]
    y_train_fold,y_test_fold = y_train[train_index],y_train[test_index]
    clf.fit(x_train_fold,y_train_fold)

Throws this error:引发此错误:

Traceback (most recent call last):
  File "test_traintest.py", line 23, in <module>
    x_train_fold,x_test_fold = X_train[train_index],X_train[test_index]
  File "/Users/slowat/anaconda/envs/nlp_course/lib/python3.7/site-packages/pandas/core/frame.py", line 3030, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/Users/slowat/anaconda/envs/nlp_course/lib/python3.7/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/Users/slowat/anaconda/envs/nlp_course/lib/python3.7/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([2, 3], dtype='int64')] are in the [columns]"

I saw this answer, but the length of my columns is equal.我看到了这个答案,但是我的列的长度是相等的。

KFold.split() returns the train and test indices, which should be used with a DataFrame like this: KFold.split()返回训练和测试索引,它们应该与这样的 DataFrame 一起使用:

X_train.iloc[train_index]

With your syntax, you are trying to use them as column names.使用您的语法,您试图将它们用作列名。 Change your code to:将您的代码更改为:

from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
import numpy as np

#df = pd.read_csv('missing_data.csv',sep=',')

df = pd.DataFrame(np.array([[1, 2, 3,4,5,6,7,8,9,1],
                            [4, 5, 6,3,4,5,7,5,4,1],
                            [7, 8, 9,6,2,3,6,5,4,1],
                            [7, 8, 9,6,1,3,2,2,4,0],
                            [7, 8, 9,6,5,6,6,5,4,0]]),
                            columns=['a', 'b', 'c','d','e','f','g','h','i','j'])

X_train = df.iloc[:,:-1]
y_train = df.iloc[:,-1]


clf=SVC(kernel='linear')
kfold = StratifiedKFold(n_splits=2,random_state=42,shuffle=True)
for train_index,test_index in kfold.split(X_train,y_train):
    x_train_fold,x_test_fold = X_train.iloc[train_index],X_train.iloc[test_index]
    y_train_fold,y_test_fold = y_train.iloc[train_index],y_train.iloc[test_index]
    clf.fit(x_train_fold,y_train_fold)

Note that we use .iloc and not .loc .请注意,我们使用.iloc而不是.loc This is because .iloc works with integer indices as the ones we get from split() , while .loc works on index values.这是因为.iloc使用整数索引作为我们从split()获得的索引,而.loc使用索引值。 In your case it doesn't matter, since the pandas index matches the integer indices, but in other projects you will encounter it may not be the case, so stick with .iloc .在您的情况下,这无关紧要,因为 pandas 索引与整数索引匹配,但在其他项目中您会遇到的情况可能并非如此,因此请坚持使用.iloc

Alternatively, when you extract X_train and y_train you can convert them to numpy arrays:或者,当您提取X_trainy_train时,您可以将它们转换为 numpy 数组:

X_train = df.iloc[:,:-1].to_numpy()
y_train = df.iloc[:,-1].to_numpy()

and then your code will work fine because a numpy array works fine with integer indices.然后您的代码将正常工作,因为 numpy 数组适用于整数索引。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 关键错误:[Int64Index([…]dtype='int64')] 均不在 [columns] 中 - Key Error: None of [Int64Index([…]dtype='int64')] are in the [columns] 关键错误:[Int64Index…] dtype='int64] 均不在 [columns] 中 - Key Error: None of [Int64Index…] dtype='int64] are in the [columns] 读取 CSV &amp; Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中” - Reading CSV & Columns - KeyError: “None of [Int64Index([0, 1, 2, 3], dtype='int64')] are in the [columns]” KeyError:“[Int64Index dtype=&#39;int64&#39;, length=9313)] 都不在 [columns]” - KeyError: "None of [Int64Index dtype='int64', length=9313)] are in the [columns]" Receiving KeyError: “[Int64Index([ ... dtype=&#39;int64&#39;, length=1323)] 都不在 [columns]” - Receiving KeyError: "None of [Int64Index([ ... dtype='int64', length=1323)] are in the [columns]" Python Mlens Ensemble:KeyError:“[Int64Index([... dtype='int64', length=105)] 均不在 [columns] 中” - Python Mlens Ensemble: KeyError: "None of [Int64Index([... dtype='int64', length=105)] are in the [columns]" [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index] - None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index] 迭代并更改以熊猫为单位的行的值(错误“ [index]中[Int64Index([10],dtype =&#39;int64&#39;)]都不存在”) - Iterating and changing value of the row in pandas ( Error “None of [Int64Index([10], dtype='int64')] are in the [index]” ) KeyError: &quot;[Int64Index([ 12313,\\n, 34534],\\n dtype=&#39;int64&#39;, leng - KeyError: "None of [Int64Index([ 12313,\n , 34534],\n dtype='int64', leng 关键错误:[Int64Index...] dtype=&#39;int64] 均不在列中 - Key Error: None of [Int64Index...] dtype='int64] are in the columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM