简体   繁体   English

Optuna for Catboost 以随机顺序输出“试验”?

[英]Optuna for Catboost outputs "trials" in random order?

I'm working on hyperparameter tuning using Optuna for CatboostRegressor, however I realised that the trials I'm getting are in random order (mine started with Trial 7 and then Trial 5 then Trial 8. All of the examples I see online are in order, for example Trial 0 finished with value: xxxxx, Trial 1, Trial 2... (Example: https://www.kaggle.com/saurabhshahane/catboost-hyperparameter-tuning-with-optuna )我正在使用 Optuna 对 CatboostRegressor 进行超参数调整,但是我意识到我得到的试验是随机顺序的(我从试验 7 开始,然后是试验 5,然后是试验 8。我在网上看到的所有例子都是有序的,例如 Trial 0 完成值:xxxxx, Trial 1, Trial 2...(例如: https : //www.kaggle.com/saurabhshahane/catboost-hyperparameter-tuning-with-optuna

Is this an issue or it's not something to worry about?这是一个问题还是不需要担心? Not sure why is mine in a random order though.不知道为什么我的顺序是随机的。

Also wondering if I should be doing cb.cv (Catboost's cross validation) instead of cb.CatBoostRegressor and then .fit and .predict for hyperparameter tuning?也想知道如果我应该做cb.cv代替(Catboost的交叉验证) cb.CatBoostRegressor然后.fit.predict为超参数调整? Or it doesn't matter which way I'm using to get the best hyperparameters?或者我使用哪种方式来获得最佳超参数并不重要?

在此处输入图片说明

This is my code:这是我的代码:

def objective(trial):

    optuna_params = {"subsample": trial.suggest_float("subsample", 0.5, 0.99),
                     'od_wait': trial.suggest_int('od_wait', 10, 50, step=1),
                     "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 0.99),
                     "random_strength": trial.suggest_int("random_strength", 1, 10, step=1),
                     "l2_leaf_reg": trial.suggest_float("l2_leaf_reg", 1.0, 50.0),
                     "max_depth": trial.suggest_int("max_depth", 4, 10, step=1),
                     "n_estimators": trial.suggest_int("n_estimators", 100, 2500, step=1),
                     'learning_rate': trial.suggest_loguniform("learning_rate", 0.005, 0.1)}

    cbregressor = cb.CatBoostRegressor(**optuna_params, 
                                       random_state=0,
                                       loss_function='MAE', 
                                       eval_metric='MAE', 
                                       one_hot_max_size=0,
                                       boost_from_average=True)
    
    cat_optuna = cbregressor.fit(cat_train_pool2, eval_set=cat_val_pool2, verbose=False, early_stopping_rounds=10)
    
    y_valid_pred_cat3 = cat_optuna.predict(X_validation2)
    
    MAE = mean_absolute_error(y_validation, y_valid_pred_cat3)
    print('MAE score of CatBoost =', MAE)
    return MAE
study = optuna.create_study(direction="minimize", sampler = TPESampler(seed=0), study_name="Catboost Optuna")
study.optimize(objective, n_trials=100, n_jobs=-1)

Is this an issue or it's not something to worry about?这是一个问题还是不需要担心? Not sure why is mine in a random order though.不知道为什么我的顺序是随机的。

No. When we set n_jobs=-1 in study.optimize method, optimisation is performed parallelly using thread. n_jobs=-1 。当我们在study.optimize方法中设置n_jobs=-1时,优化是使用线程并行执行的。

Regresser vs CV回归器与简历

I suppose either is fine.我想两者都可以。 In general, when we use CV, overfitting is less likely to happen than single train/val split (ie, regresser in this setting).一般而言,当我们使用 CV 时,过拟合发生的可能性比单个训练/val 分割(即此设置中的回归器)的可能性小。 However, the computing of CV cost is expensive.然而,CV成本的计算是昂贵的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM