[英]Why does my XBGoost model have a good accuracy for training and testing dataset, but poor one for predicting an held out dataset?
I'm currently working on a XGBoost regression model to predict ticket bookings.我目前正在研究 XGBoost 回归 model 来预测机票预订。 My issue is that my model has a good accuracy for the training set (around 96%) and for the testing set (around 94%) but when I try to use the model to predict my booking on another held out dataset the accuracy on this one drop to 82%.
我的问题是我的 model 对训练集(大约 96%)和测试集(大约 94%)有很好的准确性,但是当我尝试使用 model 来预测我在另一个保留数据集上的预订时,这个准确性下降到 82%。 I tried switching some data from my testing set to this held out set and the accuracy is still pretty bad, even though the model can efficiently predict these data when they're inside my testing set.
我尝试将一些数据从我的测试集中切换到这个保留集,但准确性仍然很差,即使 model 可以在我的测试集中有效地预测这些数据。 I assume I'm doing something wrong but I can't figure out what.
我认为我做错了什么,但我不知道是什么。 Any help would be appreciated, thanks
任何帮助将不胜感激,谢谢
Here's the XGBoost model part of my code:这是我的代码的 XGBoost model 部分:
import xgboost as xgb
from sklearn.metrics import mean_squared_error
X_conso, y_conso = data_conso2.iloc[:,:-1],data_conso2.iloc[:,-1]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_conso, y_conso, test_size=0.3, random_state=20)
d_train = xgb.DMatrix(X_train, label = y_train)
d_test = xgb.DMatrix(X_test, label = y_test)
d_fcst_held_out = xgb.DMatrix(X_fcst_held_out)
params = {'p_colsample_bytree_conso' : 0.9,
'p_colsample_bylevel_conso': 0.9,
'p_colsample_bynode_conso': 0.9,
'p_learning_rate_conso': 0.3,
'p_max_depth_conso': 10,
'p_alpha_conso': 3,
'p_n_estimators_conso': 10,
'p_gamma_conso': 0.8}
steps = 100
watchlist = [(d_train, 'train'), (d_test, 'test')]
model = xgb.train(params, d_train, steps, watchlist, early_stopping_rounds = 50)
preds_train = model.predict(d_train)
preds_test = model.predict(d_test)
preds_fcst = model.predict(d_fcst_held_out)
And my accuracy levels :
Error train: 4.524787%
Error test: 5.978759%
Error fcst: 18.008451%
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.