了解 Optuna 中的中间值和修剪

Question

I am just curious for more information on what an intermediate step actually is and how to use pruning if you're using a different ml library that isn't in the tutorial section eg) XGB, Pytorch etc.我只是想了解更多关于中间步骤实际上是什么以及如果您使用不在教程部分中的不同 ml 库（例如）XGB、Pytorch 等如何使用修剪的更多信息。

For example:例如：

X, y = load_iris(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
classes = np.unique(y)
n_train_iter = 100

def objective(trial):
    global num_pruned
    alpha = trial.suggest_float("alpha", 0.0, 1.0)
    clf = SGDClassifier(alpha=alpha)
    for step in range(n_train_iter):
        clf.partial_fit(X_train, y_train, classes=classes)

        intermediate_value = clf.score(X_valid, y_valid)
        trial.report(intermediate_value, step)

        if trial.should_prune():
            raise optuna.TrialPruned()

    return clf.score(X_valid, y_valid)


study = optuna.create_study(
    direction="maximize",
    pruner=optuna.pruners.HyperbandPruner(
        min_resource=1, max_resource=n_train_iter, reduction_factor=3
    ),
)
study.optimize(objective, n_trials=30)

What is the point of the for step in range() section? for step in range()部分中的for step in range()什么意义？ Doesn't doing this just make the optimisation take more time and won't you yield the same result for every step in the loop?这样做是否只会使优化花费更多时间，并且您不会为循环中的每一步产生相同的结果吗？

I'm really trying to figure out the need for for step in range() and is it required every time you wish to use pruning?我真的想弄清楚是否需要for step in range()并且每次您希望使用修剪时都需要它吗？

Answer 1

The basic model creation can be done by passing a complete training datasets once.可以通过传递一次完整的训练数据集来完成基本模型的创建。 But there are models that can still be improved (an increase in accuracy) by re-training again on the same training datasets.但是有些模型仍然可以通过在相同的训练数据集上再次重新训练来改进（提高准确性）。

To see to it that we are not wasting resources here, we would check the accuracy after every step using the validation datasets via intermediate_score if accuracy improves, if not we prune the whole trial skipping other steps.为了确保我们不会在这里浪费资源，如果准确性提高，我们将在使用验证数据集的每个步骤后通过intermediate_score分数检查准确性，如果没有提高，我们将跳过整个试验跳过其他步骤。 Then we go for next trial asking another value of alpha - the hyperparameter that we are trying to determine to have the greatest accuracy on the validation datasets.然后我们进行下一次试验，询问另一个 alpha 值——我们试图确定的超参数，以便在验证数据集上具有最大的准确性。

For other libraries, it is just a matter of asking ourselves what do we want with our model, accuracy for sure is a good criteria to measure the model's competency.对于其他库，这只是问自己我们想要我们的模型什么的问题，准确度肯定是衡量模型能力的一个很好的标准。 There can be others.可以有其他人。

Example optuna pruning, I want the model to continue re-training but only at my specific conditions.示例 optuna 修剪，我希望模型继续重新训练，但仅限于我的特定条件。 If intermediate value cannot defeat my best_accuracy and if steps are already more than half of my max iteration then prune this trial.如果中间值不能打败我的 best_accuracy 并且如果步骤已经超过我最大迭代的一半，那么修剪这个试验。

best_accuracy = 0.0


def objective(trial):
    global best_accuracy

    alpha = trial.suggest_float("alpha", 0.0, 1.0)
    clf = SGDClassifier(alpha=alpha)

    for step in range(n_train_iter):
        clf.partial_fit(X_train, y_train, classes=classes)

        if step > n_train_iter//2:
            intermediate_value = clf.score(X_valid, y_valid)

            if intermediate_value < best_accuracy:
                raise optuna.TrialPruned()

    best_accuracy = clf.score(X_valid, y_valid)

    return best_accuracy

Optuna has specialized pruners at https://optuna.readthedocs.io/en/stable/reference/pruners.html Optuna 在https://optuna.readthedocs.io/en/stable/reference/pruners.html有专门的修剪器

了解 Optuna 中的中间值和修剪

问题描述

1 个解决方案

解决方案1
0 2021-11-17 03:18:48

了解 Optuna 中的中间值和修剪

问题描述

1 个解决方案

解决方案1 0 2021-11-17 03:18:48

解决方案1
0 2021-11-17 03:18:48