简体   繁体   中英

How can I modify the grid parameters from an trained ML model?

I have trained an xgboost.XGBClassifier model with GridSearchCV , when calling grid_search_xgb.best_estimator_.get_params() to obtain the best parameters of that model I get this:

{'objective': 'binary:logistic',
 ...
 'missing': nan,
 'monotone_constraints': '()',
 'n_estimators': 1000,
 ...
 }

From a plot I did, I know that this model is overfitted. However, if n_estimators = 123 , then the training and test evaluation metric are very similar (minimum overfitting). Hence, I will train the model again only replacing the n_estimators with 123 instead of 1000, with this piece of code:

optimal_params_grid = grid_search_xgb.best_estimator_.get_params()
optimal_params_grid['n_estimators'] = 123

Which works perfectly, However: when I train that model again:

model_xgb = XGBClassifier()
grid_search_xgb = GridSearchCV(model_xgb, optimal_params_grid, cv=5, verbose=1, n_jobs=-1)
grid_search_xgb.fit(X_train, y_train, eval_set = [(X_train,y_train),(X_test,y_test)])

It raises this error:

TypeError: Parameter grid for parameter 'objective' needs to be a list or a numpy array, but got 'binary:logistic' (of type str) instead. Single values need to be wrapped in a list with one element.

This is because the dictionary is not in the right format. Each key should be encoded in brackets, like:

{'objective': ['binary:logistic'],
...
}

However, I can't find a way to add brackets to every value, and at the same time be 100% sure that it was done correctly. I read somewhere that when I call a dictionary (or something in a dictionary), the order is not always the same. Hence, I'm afraid of replacing the wrong value in the wrong key.

Problems/Questions

  1. Is there any way this can be done 100% correctly?
  2. As a second question, I'm wondering if there is no a more straightforward alternative to pick the latest model and modify it's number of estimators. For example, it would be nice if I could just pick the model when the estimator number 123 was trained. Is that possible? Or the only alternative is to train it again with n_estimators=123 ?

For question 1, this seems to work:

optimal_params_grid = grid_search_xgb.best_estimator_.get_params()
optimal_params_grid['n_estimators'] = 123

# We change the format of optimal_params_grid correctly
for key, value in optimal_params_grid.items():
    optimal_params_grid[key] = [optimal_params_grid[key]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM