简体   繁体   English

Open AI Gym:如何将每个步骤中的多个动作传递到我们的自定义健身房环境?

[英]Open AI Gym: How to pass multiple actions within each step to our custom gym environment?

I am trying to develop a custom gym environment for a Reinforcement Learning Use case.我正在尝试为强化学习用例开发自定义健身房环境。 In this environment my main aim is to predict the state based on several action that are to be taken in each step ie simply my observation_space was dependent on multiple actions in the action_space.在这种环境下,我的主要目标是根据每个步骤中要采取的几个动作来预测状态,即简单地我的观察空间依赖于动作空间中的多个动作。 I tried providing several actions to the environment within a Tuple of different Box spaces values as shown below:我尝试在不同 Box 空间值的元组中为环境提供几个操作,如下所示:

self.action_space = Tuple([Box(low=np.array([22]),high=np.array([25])),
                           Box(low=np.array([0]), high=np.array([230])),
                           Box(low=np.array([0]), high=np.array([33])),
                           Box(low=np.array([0]), high=np.array([3.5]))])

Multiple Action Spaces:多个动作空间:

图像

The environment was built successfully, however, when I tried training the PPO model on the specified environment I am facing the following error:环境构建成功,但是,当我尝试在指定环境上训练 PPO 模型时,我遇到以下错误:

Error: AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(22.0, 25.0, (1,), float32), Box(0.0, 230.0, (1,), float32), Box(0.0, 33.0, (1,), float32), Box(0.0, 3.5, (1,), float32)) was provided

Error:错误:图像

Can anyone suggest how to deal with this issue when working with multiple actions within a single action space and what exactly the error is signifying since what I understood was we needed to pass gym spaces within tuple however I had passed the Box space yet the error is thrown?任何人都可以建议在单个动作空间中处理多个动作时如何处理这个问题,以及错误的确切含义是什么,因为我知道我们需要在元组中传递健身房空间但是我已经通过了 Box 空间但错误是抛出?

如果您是 stable_baseline3,那么问题可能来自它不支持 Tuple 的事实,请考虑使用 Dict 代替。

Maybe this can help you:也许这可以帮助你:

        self.action_space = spaces.Box(low=np.float32(np.tile([0, 0, 0, 0, 0, 0], (25, 1))),
                                        high=np.float32(np.tile([1, 1, 30, 3, 350, 8], (25, 1))),
                                        dtype=np.float32)

when you get a sample of this action, you will get an numpy array of (25, 6) dimension.当您获得此操作的样本时,您将获得一个 (25, 6) 维的 numpy 数组。 something to like this (rounded to 1):像这样的东西(四舍五入为1):

 [[7.000e-01 0.000e+00 2.700e+00 2.500e+00 1.865e+02 2.700e+00]
 [7.000e-01 4.000e-01 6.200e+00 2.600e+00 2.424e+02 5.100e+00]
 [5.000e-01 6.000e-01 2.680e+01 1.400e+00 5.270e+01 5.800e+00]
 [9.000e-01 6.000e-01 1.320e+01 1.900e+00 3.254e+02 7.400e+00]
 [1.000e-01 3.000e-01 2.800e+01 2.800e+00 1.197e+02 2.600e+00]
 [7.000e-01 2.000e-01 5.500e+00 1.500e+00 3.046e+02 6.000e+00]
 [6.000e-01 8.000e-01 5.000e-01 2.000e+00 2.810e+02 2.900e+00]
 [9.000e-01 9.000e-01 2.750e+01 2.200e+00 7.150e+01 1.300e+00]
 [2.000e-01 8.000e-01 2.410e+01 1.300e+00 1.843e+02 4.000e+00]
 [8.000e-01 6.000e-01 2.900e+01 2.000e+00 1.266e+02 7.100e+00]
 [8.000e-01 7.000e-01 3.900e+00 8.000e-01 3.105e+02 1.200e+00]
 [5.000e-01 1.000e+00 1.910e+01 2.300e+00 1.404e+02 2.700e+00]
 [3.000e-01 4.000e-01 7.100e+00 1.700e+00 2.591e+02 2.300e+00]
 [8.000e-01 9.000e-01 1.200e+01 2.600e+00 1.713e+02 7.000e+00]
 [0.000e+00 1.000e+00 1.660e+01 0.000e+00 1.912e+02 4.000e+00]
 [4.000e-01 0.000e+00 1.360e+01 2.600e+00 8.790e+01 2.000e-01]
 [6.000e-01 0.000e+00 2.750e+01 2.500e+00 2.577e+02 5.700e+00]
 [1.000e-01 9.000e-01 9.800e+00 1.000e+00 2.493e+02 1.100e+00]
 [8.000e-01 1.000e-01 1.000e+01 1.500e+00 2.122e+02 5.400e+00]
 [5.000e-01 3.000e-01 2.700e+01 6.000e-01 2.810e+01 9.000e-01]
 [4.000e-01 1.000e-01 2.330e+01 1.500e+00 1.339e+02 1.400e+00]
 [6.000e-01 9.000e-01 1.900e+00 2.400e+00 7.430e+01 6.900e+00]
 [9.000e-01 8.000e-01 2.910e+01 4.000e-01 1.926e+02 3.200e+00]
 [3.000e-01 1.000e-01 1.110e+01 1.400e+00 3.198e+02 1.600e+00]
 [6.000e-01 1.000e+00 2.580e+01 2.700e+00 6.220e+01 5.700e+00]]

Also, remember to use the correct algorithm for this kind of spaces, check stable-baselines3 for this: stable-baselines3 A2C另外,请记住对这种空间使用正确的算法,检查 stable-baselines3 : stable-baselines3 A2C

用于行动和观察的 A2C 空间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM