如何在健身房环境的动作空间中使用连续值？

Question

I am trying to make a custom gym environment with five actions, all of which can have continuous values.我正在尝试使用五个动作制作一个自定义的健身房环境，所有这些动作都可以具有连续值。 To implement the same, I have used the following action_space format:为了实现相同的功能，我使用了以下action_space格式：

self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))

However, when I try to run a PPO model(from stable_baselines3 ), I get the following error:但是，当我尝试运行 PPO 模型（来自stable_baselines3 ）时，出现以下错误：

AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided

I searched for a bit about this issue and I found this on Github:我搜索了一些关于这个问题的信息，并在 Github 上找到了这个：

Link According to this I changed my code in the following way:链接据此，我通过以下方式更改了我的代码：

self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           "topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           "presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}

But this still returned the same error.但这仍然返回相同的错误。

Also, I found this answer: Link另外，我找到了这个答案：链接

According to this, my code should work as I am using the Tuple space too.据此，我的代码应该可以工作，因为我也在使用元组空间。

How do I convert this to an accepted data type for the action_space?如何将其转换为 action_space 的可接受数据类型？

Answer 1

Unfortunately most of the stable-baselines3 implementation only support Box, Discrete, MultiDiscrete and MultiBinary action spaces (see stable-baselines3 Implemented Algorithms ).不幸的是，大多数 stable-baselines3 实现只支持 Box、Discrete、MultiDiscrete 和 MultiBinary 动作空间（参见 stable-baselines3 实现的算法）。

The link you posted referred to openai, and not stable-baselines3.您发布的链接指的是 openai，而不是 stable-baselines3。

You should look into other frameworks and check if their algorithm implementations support Tuples / Dictionaries, or otherwise try to implement your own!您应该查看其他框架并检查他们的算法实现是否支持元组/字典，或者尝试实现您自己的！

Otherwise you could try to check if your action spaces with multiple Box-type actions can be easily converted into Discrete-type actions!否则，您可以尝试检查具有多个 Box 类型动作的动作空间是否可以轻松转换为离散类型动作！ (which is supported in stable-baselines3 through MultiDiscrete) （通过 MultiDiscrete 在 stable-baselines3 中支持）

Answer 2

"All of which can have continuous values" “所有这些都可以有连续的值”

Your link is about mixed between integer and continues.您的链接是关于整数之间的混合并继续。 To simply make all continues, you can use Box alone.为了简单地让一切继续，您可以单独使用Box 。

self.action_space = spaces.Box(low=np.array([0,0,-2,0,1]),
                               high=np.array([1,1,2,1,20]),
                               dtype=np.float32)

如何在健身房环境的动作空间中使用连续值？

问题描述

2 个解决方案

解决方案1
0 2022-03-27 10:45:24

解决方案2
0 2022-06-15 12:49:53

如何在健身房环境的动作空间中使用连续值？

问题描述

2 个解决方案

解决方案1 0 2022-03-27 10:45:24

解决方案2 0 2022-06-15 12:49:53

解决方案1
0 2022-03-27 10:45:24

解决方案2
0 2022-06-15 12:49:53