简体   繁体   English

关键错误:[Int64Index…] dtype='int64] 均不在 [columns] 中

[英]Key Error: None of [Int64Index…] dtype='int64] are in the [columns]

I'm trying to run k-fold cross validation on pipeline(Standardscaler,DecisionTreeClassifier).我正在尝试在管道(Standardscaler,DecisionTreeClassifier)上运行 k 折交叉验证。

First, I import the data.首先,我导入数据。

data = pd.read_csv('train_strokes.csv')

Then preprocessing dataframe然后预处理 dataframe

# Preprocessing data 
data.drop('id',axis=1,inplace=True)
data['age'] =data['age'].apply(lambda x : x if round(x) else np.nan) 
data['bmi'] = data['bmi'].apply(lambda bmi : bmi if 12< bmi <45 else np.nan)
data['gender'] = data['gender'].apply(lambda gender : gender if gender =='Female' or gender =='Male' else np.nan)
data.sort_values(['gender', 'age','bmi'], inplace=True) 
data['bmi'].ffill(inplace=True)
data.dropna(axis=0,inplace=True)
data.reset_index(drop=True, inplace=True)

#categorial data to numeric value
enc = LabelEncoder()
data['gender'] = enc.fit_transform(data['gender'])
data['work_type'] = enc.fit_transform(data['work_type'])
data['Residence_type'] = enc.fit_transform(data['Residence_type'])
data['smoking_status'] = enc.fit_transform(data['smoking_status'])
data['ever_married'] = enc.fit_transform(data['ever_married'])

then slice feature and target然后切片特征和目标

target = data['stroke']
feat = data.drop('stroke',axis=1)

and Using SMOTE to balance the Data并使用 SMOTE 平衡数据

sm = SMOTE(random_state = 1) 
feat, target = sm.fit_resample(feat, target) 
feat['age'] = feat['age'].apply(lambda x : round(x))
feat['hypertension'] = feat['hypertension'].apply(lambda x : round(x))
feat['heart_disease'] = feat['heart_disease'].apply(lambda x : round(x))
feat['ever_married'] = feat['ever_married'].apply(lambda x : round(x))
#split training and test
X_train, X_test, y_train, y_test = train_test_split(feat, target, test_size=0.3, random_state= 2)

It's part of the problem.这是问题的一部分。

Kfold =KFold(n_splits=10)
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
n_iter = 0
for train_idx, test_idx in Kfold.split(feat):
    pipeline.fit(X_train[train_idx], y_train[train_idx])
    score = pipeline.score(X_train[test_idx],y_train[test_idx])
    print('Fold #{} accuracy{}'.format(1,score))

ERROR CODE错误代码

Traceback (most recent call last):
File "/Users/merb/Documents/Dev/DataScience/TP.py", line 84, in <module>
pipeline.fit(X_train[train_idx], y_train[train_idx])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site- 
packages/pandas/core/frame.py", line 3030, in __getitem__
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-  
packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-   
packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([ 5893,  5894,  5895,  5896,  5897,  5898,  5899,  5900,    
5901,\n             5902,\n            ...\n            58912, 58913, 58914, 58915, 
58916, 58917, 58918, 58919, 58920,\n            58921],\n           dtype='int64', 
length=53029)] are in the [columns]"

You should use df.loc[indexes] to select rows by their indexes.您应该使用df.loc[indexes]到 select 行的索引。 If you want to select rows by their integer location you should use df.iloc[indexes] .如果您想通过 select 行的 integer 位置,您应该使用df.iloc[indexes]

In addition to that, you can read this page on Indexing and Selecting data with pandas.除此之外,您还可以阅读有关使用 pandas 索引和选择数据的页面

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 关键错误:[Int64Index([…]dtype='int64')] 均不在 [columns] 中 - Key Error: None of [Int64Index([…]dtype='int64')] are in the [columns] Sklearn 错误:[Int64Index([2, 3], dtype=&#39;int64&#39;)] 均不在 [columns] 中 - Sklearn error: None of [Int64Index([2, 3], dtype='int64')] are in the [columns] 读取 CSV &amp; Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中” - Reading CSV & Columns - KeyError: “None of [Int64Index([0, 1, 2, 3], dtype='int64')] are in the [columns]” KeyError:“[Int64Index dtype=&#39;int64&#39;, length=9313)] 都不在 [columns]” - KeyError: "None of [Int64Index dtype='int64', length=9313)] are in the [columns]" Receiving KeyError: “[Int64Index([ ... dtype=&#39;int64&#39;, length=1323)] 都不在 [columns]” - Receiving KeyError: "None of [Int64Index([ ... dtype='int64', length=1323)] are in the [columns]" Python Mlens Ensemble:KeyError:“[Int64Index([... dtype='int64', length=105)] 均不在 [columns] 中” - Python Mlens Ensemble: KeyError: "None of [Int64Index([... dtype='int64', length=105)] are in the [columns]" 关键错误:[Int64Index...] dtype=&#39;int64] 均不在列中 - Key Error: None of [Int64Index...] dtype='int64] are in the columns 关键错误:“[Int64Index...] dtype=&#39;int64] 均不在列中” - Key Error: "None of [Int64Index...] dtype='int64] are in the columns" [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] 中没有一个在 [index] - None of [Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='index')] are in the [index] 迭代并更改以熊猫为单位的行的值(错误“ [index]中[Int64Index([10],dtype =&#39;int64&#39;)]都不存在”) - Iterating and changing value of the row in pandas ( Error “None of [Int64Index([10], dtype='int64')] are in the [index]” )
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM