简体   繁体   English

超参数调整与射线不停止

[英]Hyperparameters tuning with ray don't stop

I recently tried to use a hyperOpt algorithm to find the best hyperparameters configuration of a PPO algorithm.我最近尝试使用 hyperOpt 算法来找到 PPO 算法的最佳超参数配置。

The trained model is en Gym environment (LunarLander-v2).训练好的模型是 en Gym 环境 (LunarLander-v2)。

When I run my program, it never stops, I did not find which parameter add to the config to limit the number of episodes of my training.当我运行我的程序时,它永远不会停止,我没有找到将哪个参数添加到配置中来限制我的训练集数。

Here is the config I used: ''' def explore(config): # ensure we collect enough timesteps to do sgd if config["train_batch_size"] < config["sgd_minibatch_size"] * 2: config["train_batch_size"] = config["sgd_minibatch_size"] * 2 # ensure we run at least one sgd iter if config["num_sgd_iter"] < 1: config["num_sgd_iter"] = 1 return config这是我使用的配置: ''' def explore(config): # 确保我们收集了足够的时间步来执行 sgd if config["train_batch_size"] < config["sgd_minibatch_size"] * 2: config["train_batch_size"] = config ["sgd_minibatch_size"] * 2 # 确保我们至少运行一个 sgd iter if config["num_sgd_iter"] < 1: config["num_sgd_iter"] = 1 return config

    config = {
                 "env": "LunarLander-v2",
                 "sgd_minibatch_size": 5000,
                 "num_sgd_iter": 2,
                 "lr": tune.uniform(5e-6, 5e-2),
                 "lambda": tune.uniform(0.6, 0.9),
                 "vf_loss_coeff": 0.7,
                 "kl_target": 0.01,
                 "kl_coeff": tune.uniform(0.5, 0.9),
                 "entropy_coeff": 0.001,
                 "clip_param": tune.uniform(0.4, 0.8),
                 "train_batch_size": 25000, # taille de l'épisode
                 # "monitor": True,
                 # "model": {"free_log_std": True},
                 "num_workers": 4,
                 "num_gpus": 0,
                 # "rollout_fragment_length":3
                 # "batch_mode": "complete_episodes"
             }

    optimizer = HyperOptSearch(metric="episode_reward_mean", mode="max", n_initial_points=1, random_state_seed=7, space=explore(config))

    # optimizer = ConcurrencyLimiter(optimizer, max_concurrent=4)

    analysis = tune.Tuner(
        "PPO",  # Objective function

        tune_config=tune.TuneConfig(

        metric="episode_reward_mean", # the metric we want to study
        mode="max", # maximize the metric
        search_alg=optimizer,
        # num_samples will repeat the entire config 'num_samples' times == Number of trials dans l'output 'Status'
        num_samples=2,

        ),

    )
    results = analysis.fit()'''

I thought the hyperparameter for the number of episodes was num_sgd_iter, but visibly it is not.我以为集数的超参数是 num_sgd_iter,但显然不是。

tune.Tuner has a run_config argument, which you can use as follows in your case: tune.Tuner 有一个run_config参数,您可以在您的情况下按如下方式使用它:

 analysis = tune.Tuner(
        "PPO",  # Objective function
        tune_config=tune.TuneConfig(
            search_alg=optimizer,
            num_samples=2,
        ),
        run_config=air.RunConfig(
            stop={"num_env_steps_trained": 10000}
        ),
    )

You can generally stop on anything that tune reports back (things that you see it report in your CLI).您通常可以停止调整报告的任何内容(您在 CLI 中看到它报告的内容)。 Be aware that some of these are nested and so you might have to specify them according to their nesting.请注意其中一些是嵌套的,因此您可能必须根据它们的嵌套来指定它们。 Also, metric="episode_reward_mean" and mode="max" are the defaults, so you don't have to specify them.此外, metric="episode_reward_mean"mode="max"是默认值,因此您不必指定它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM