Optuna for Catboost 以随机顺序输出“试验”？

Question

I'm working on hyperparameter tuning using Optuna for CatboostRegressor, however I realised that the trials I'm getting are in random order (mine started with Trial 7 and then Trial 5 then Trial 8. All of the examples I see online are in order, for example Trial 0 finished with value: xxxxx, Trial 1, Trial 2... (Example: https://www.kaggle.com/saurabhshahane/catboost-hyperparameter-tuning-with-optuna )我正在使用 Optuna 对 CatboostRegressor 进行超参数调整，但是我意识到我得到的试验是随机顺序的（我从试验 7 开始，然后是试验 5，然后是试验 8。我在网上看到的所有例子都是有序的，例如 Trial 0 完成值：xxxxx, Trial 1, Trial 2...（例如： https : //www.kaggle.com/saurabhshahane/catboost-hyperparameter-tuning-with-optuna ）

Is this an issue or it's not something to worry about?这是一个问题还是不需要担心？ Not sure why is mine in a random order though.不知道为什么我的顺序是随机的。

Also wondering if I should be doing cb.cv (Catboost's cross validation) instead of cb.CatBoostRegressor and then .fit and .predict for hyperparameter tuning?也想知道如果我应该做cb.cv代替（Catboost的交叉验证） cb.CatBoostRegressor然后.fit和.predict为超参数调整？ Or it doesn't matter which way I'm using to get the best hyperparameters?或者我使用哪种方式来获得最佳超参数并不重要？

This is my code:这是我的代码：

def objective(trial):

    optuna_params = {"subsample": trial.suggest_float("subsample", 0.5, 0.99),
                     'od_wait': trial.suggest_int('od_wait', 10, 50, step=1),
                     "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 0.99),
                     "random_strength": trial.suggest_int("random_strength", 1, 10, step=1),
                     "l2_leaf_reg": trial.suggest_float("l2_leaf_reg", 1.0, 50.0),
                     "max_depth": trial.suggest_int("max_depth", 4, 10, step=1),
                     "n_estimators": trial.suggest_int("n_estimators", 100, 2500, step=1),
                     'learning_rate': trial.suggest_loguniform("learning_rate", 0.005, 0.1)}

    cbregressor = cb.CatBoostRegressor(**optuna_params, 
                                       random_state=0,
                                       loss_function='MAE', 
                                       eval_metric='MAE', 
                                       one_hot_max_size=0,
                                       boost_from_average=True)
    
    cat_optuna = cbregressor.fit(cat_train_pool2, eval_set=cat_val_pool2, verbose=False, early_stopping_rounds=10)
    
    y_valid_pred_cat3 = cat_optuna.predict(X_validation2)
    
    MAE = mean_absolute_error(y_validation, y_valid_pred_cat3)
    print('MAE score of CatBoost =', MAE)
    return MAE

study = optuna.create_study(direction="minimize", sampler = TPESampler(seed=0), study_name="Catboost Optuna")
study.optimize(objective, n_trials=100, n_jobs=-1)

Answer 1

Is this an issue or it's not something to worry about?这是一个问题还是不需要担心？ Not sure why is mine in a random order though.不知道为什么我的顺序是随机的。

No. When we set n_jobs=-1 in study.optimize method, optimisation is performed parallelly using thread. n_jobs=-1 。当我们在study.optimize方法中设置n_jobs=-1时，优化是使用线程并行执行的。

Regresser vs CV回归器与简历

I suppose either is fine.我想两者都可以。 In general, when we use CV, overfitting is less likely to happen than single train/val split (ie, regresser in this setting).一般而言，当我们使用 CV 时，过拟合发生的可能性比单个训练/val 分割（即此设置中的回归器）的可能性小。 However, the computing of CV cost is expensive.然而，CV成本的计算是昂贵的。

Optuna for Catboost 以随机顺序输出“试验”？

问题描述

1 个解决方案

解决方案1
0 2021-10-24 06:16:59

Regresser vs CV回归器与简历

Optuna for Catboost 以随机顺序输出“试验”？

问题描述

1 个解决方案

解决方案1 0 2021-10-24 06:16:59

Regresser vs CV回归器与简历

解决方案1
0 2021-10-24 06:16:59