您如何在 Ray Tune 中的自定义 Gym 环境中使用 OpenAI Gym“包装器”？

Question

How do you use OpenAI Gym 'wrappers' with a custom Gym environment in Ray Tune ?您如何在Ray Tune中的自定义 Gym 环境中使用 OpenAI Gym“包装器”？

Let's say I built a Python class called CustomEnv (similar to the ' CartPoleEnv ' class used to create the OpenAI Gym "CartPole-v1" environment) to create my own (custom) reinforcement learning environment, and I am using tune.run() from Ray Tune (in Ray 2.1.0 with Python 3.9.15) to train an agent in my environment using the ' PPO ' algorithm:假设我构建了一个名为 CustomEnv 的 Python CustomEnv （类似于用于创建 OpenAI Gym "CartPole-v1"环境的“ CartPoleEnv ”class）来创建我自己的（自定义）强化学习环境，我正在使用tune.run()来自Ray Tune （在 Ray 2.1.0 和 Python 3.9.15 中）使用“ PPO ”算法在我的环境中训练代理：

import ray
from ray import tune
tune.run(
        "PPO",                         # 'PPO' algorithm
        config={"env": CustomEnv,      # custom class used to create an environment
            "framework": "tf2",
            "evaluation_interval": 100, 
            "evaluation_duration": 100,
            },
        checkpoint_freq = 100,             # Save checkpoint at every evaluation
        local_dir=checkpoint_dir,          # Save results to a local directory
        stop{"episode_reward_mean": 250},  # Stopping criterion
        )

This works fine, and I can use TensorBoard to monitor training progress, etc., but as it turns out, learning is slow, so I want to try using 'wrappers' from Gym to scale observations, rewards, and/or actions, limit variance, and speed-up learning.这很好用，我可以使用TensorBoard来监控训练进度等，但事实证明，学习速度很慢，所以我想尝试使用 Gym 的“包装器”来衡量观察、奖励和/或行动，限制方差和加速学习。 So I've got an ObservationWrapper, a RewardWrapper, and an ActionWrapper to do that--for example, something like this (the exact nature of the scaling is not central to my question):所以我有一个 ObservationWrapper、一个 RewardWrapper 和一个 ActionWrapper 来执行此操作——例如，像这样的东西（缩放的确切性质不是我的问题的核心）：

import gym

class ObservationWrapper(gym.ObservationWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.o_min = 0.
        self.o_max = 5000.

    def observation(self, ob):
        # Normalize observations
        ob = (ob - self.o_min)/(self.o_max - self.o_min)
        return ob

class RewardWrapper(gym.RewardWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.r_min = -500
        self.r_max = 100

    def reward(self, reward):
        # Scale rewards:
        reward = reward/(self.r_max - self.r_min)
        return reward

class ActionWrapper(gym.ActionWrapper):
    def __init__(self, env):
        super().__init__(env)

    def action(self, action):
        # Scale actions
        action = action/10
        return action

Wrappers like these work fine with my custom class when I create an instance of the class on my local machine and use it in traditional training loops, like this:当我在本地计算机上创建 class 的实例并将其用于传统训练循环时，像这样的包装器可以很好地与我的自定义 class 配合使用，如下所示：

from my_file import CustomEnv

env = CustomEnv()
wrapped_env = ObservationWrapper(RewardWrapper(ActionWrapper(env)))
episodes = 10

for episode in range(1,episodes+1):
    obs = wrapped_env.reset()
    done = False
    score = 0
    
    while not done:
        action = wrapped_env.action_space.sample()
        obs, reward, done, info = wrapped_env.step(action)
        score += reward

    print(f'Episode: {episode},  Score: {score:.3f}')

My question is: How can I use wrappers like these with my custom class ( CustomEnv ) and ray.tune() ?我的问题是：如何将这些包装器与我的自定义 class ( CustomEnv ) 和ray.tune()使用？ This particular method expects the value for "env" to be passed either (1) as a class (such as CustomEnv ) or (2) as a string associated with a registered Gym environment (such as "CartPole-v1" ), as I found out while trying various incorrect ways to pass a wrapped version of my custom class:此特定方法期望“env”的值被传递为 (1) 作为 class（例如CustomEnv ）或 (2) 作为与注册的 Gym 环境相关联的字符串（例如"CartPole-v1" ），因为我在尝试各种不正确的方法来传递我的自定义 class 的包装版本时发现：

ValueError: >>> is an invalid env specifier. You can specify a custom env as either a class (e.g., YourEnvCls) or a registered env id (e.g., "your_env").

So I am not sure how to do it (assuming it is possible).所以我不确定该怎么做（假设有可能）。 I would prefer to solve this problem without having to register my custom Gym environment, but I am open to any solution.我宁愿在不必注册我的自定义 Gym 环境的情况下解决这个问题，但我对任何解决方案都持开放态度。

In learning about wrappers, I leveraged mostly ' Getting Started With OpenAI Gym: The Basic Building Blocks ' by Ayoosh Kathuria, and ' TF 2.0 for Reinforcement Learning : Gym Wrappers'.在学习包装器时，我主要利用了Ayoosh Kathuria 的“OpenAI Gym 入门：基本构建块”和“用于强化学习的 TF 2.0 ：健身房包装器”。

Answer 1

I was able to answer my own question about how to get Ray's tune.run() to work with a wrapped custom class for a Gym environment.我能够回答我自己的问题，即如何让 Ray 的tune.run()与 Gym 环境的包装自定义 class 一起工作。 The documentation for Ray Environments was helpful. Ray Environments的文档很有帮助。

The solution was to register the custom class through Ray.解决方案是通过Ray注册自定义class。 Assuming you have defined your Gym wrappers (classes) as discussed above, it works like this:假设您已经按照上面的讨论定义了 Gym 包装器（类），它的工作原理如下：

from ray.tune.registry import register_env
from your_file import CustomEnv             # import your custom class

def env_creator(env_config):
    # wrap and return an instance of your custom class
    return ObservationWrapper(RewardWrapper(ActionWrapper(CustomEnv())))

# Choose a name and register your custom environment
register_env('WrappedCustomEnv-v0', env_creator)

Now, in tune.run() , you can submit the name of the registered instance as you would any other registered Gym environment:现在，在tune.run()中，您可以像提交任何其他已注册的 Gym 环境一样提交已注册实例的名称：

import ray
from ray import tune

tune.run(
        "PPO",                          # 'PPO' algorithm (for example)
        config={"env": "WrappedCustomEnv-v0", # the registered instance
            #other options here as desired
            },
        # other options here as desired
        )

tune.run() will work with no errors--problem solved! tune.run()将正常工作——问题已解决！

您如何在 Ray Tune 中的自定义 Gym 环境中使用 OpenAI Gym“包装器”？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-12-03 01:36:32

您如何在 Ray Tune 中的自定义 Gym 环境中使用 OpenAI Gym“包装器”？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-12-03 01:36:32

解决方案1
0 已采纳 2022-12-03 01:36:32