简体   繁体   English

当我对XGBoost执行均方误差时,为什么会得到KeyError:'Target_Variable'?

[英]Why do I get KeyError: 'Target_Variable' when I perform Mean Squared Error for XGBoost?

I'm performing XGBoost on my flight delay datasets . 我正在对航班延误数据集执行XGBoost。 I executed and trained the dataset however when I tried to find the mean squared error test I got the above error. 我执行并训练了数据集,但是当我尝试找到均方误差测试时,出现了上述错误。

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_late.drop(['DELAY_YN','ARR_DELAY'],axis=1), 
                                                    df_late['ARR_DELAY'], test_size=0.30, random_state=101)

print('Training...')
xg_reg = xgb.XGBRegressor(n_estimators= 2000, max_depth= 5,learning_rate =0.1)
xg_reg.fit(X_train,y_train)

print('Predicting on test set...')
predictions = xg_reg.predict(X_test)

y_test['predicted']=[np.exp(p) for p in predictions]

from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test['ARR_DELAY'],y_test['predicted']))
print('MSE:', metrics.mean_squared_error(y_test['ARR_DELAY'],y_test['predicted']))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test['ARR_DELAY'],y_test['predicted'])))

I got the following error 我收到以下错误

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-141-b9b5e43dd55b> in <module>
      2 
      3 from sklearn import metrics
----> 4 print('MAE:', metrics.mean_absolute_error(y_test['ARR_DELAY'],y_test['predicted']))
      5 print('MSE:', metrics.mean_squared_error(y_test['ARR_DELAY'],y_test['predicted']))
      6 print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test['ARR_DELAY'],y_test['predicted'])))

~/anaconda3/envs/myenv/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    866         key = com.apply_if_callable(key, self)
    867         try:
--> 868             result = self.index.get_value(self, key)
    869 
    870             if not is_scalar(result):

~/anaconda3/envs/myenv/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4318         try:
   4319             return self._engine.get_value(s, k,
-> 4320                                           tz=getattr(series.dtype, 'tz', None))
   4321         except KeyError as e1:
   4322             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'ARR_DELAY'

What might be the problem ? 可能是什么问题? I'm very new to Data Science hence any help would be appreciated. 我是Data Science的新手,因此将不胜感激。

After train/test split here: 经过训练/测试拆分后,请按以下步骤操作:

X_train, X_test, y_train, y_test = train_test_split(df_late.drop(['DELAY_YN','ARR_DELAY'],axis=1), 
                                                    df_late['ARR_DELAY'], test_size=0.30, random_state=101)

y_test is in fact not a DataFrame but a Series, consists of a single column. y_test实际上不是DataFrame而是Series,它由一个列组成。 So y_test['predicted']=[np.exp(p) for p in predictions] is not really what you want. 因此y_test['predicted']=[np.exp(p) for p in predictions]并不是您真正想要的。 Instead, I would suggest keep predictions in a separate array or a Series: 相反,我建议将预测保留在单独的数组或系列中:

predictions = xg_reg.predict(X_test)

predictions = np.exp(predictions)

from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM