[英]How to use continuous values in the action space of a gym environment?
I am trying to make a custom gym environment with five actions, all of which can have continuous values.我正在尝试使用五个动作制作一个自定义的健身房环境,所有这些动作都可以具有连续值。 To implement the same, I have used the following
action_space
format:为了实现相同的功能,我使用了以下
action_space
格式:
self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))
However, when I try to run a PPO model(from stable_baselines3
), I get the following error:但是,当我尝试运行 PPO 模型(来自
stable_baselines3
)时,出现以下错误:
AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided
I searched for a bit about this issue and I found this on Github:我搜索了一些关于这个问题的信息,并在 Github 上找到了这个:
Link According to this I changed my code in the following way:链接据此,我通过以下方式更改了我的代码:
self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
"topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
"frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
"presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
"bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}
But this still returned the same error.但这仍然返回相同的错误。
Also, I found this answer: Link另外,我找到了这个答案: 链接
According to this, my code should work as I am using the Tuple space too.据此,我的代码应该可以工作,因为我也在使用元组空间。
How do I convert this to an accepted data type for the action_space?如何将其转换为 action_space 的可接受数据类型?
Unfortunately most of the stable-baselines3 implementation only support Box, Discrete, MultiDiscrete and MultiBinary action spaces (see stable-baselines3 Implemented Algorithms ).不幸的是,大多数 stable-baselines3 实现只支持 Box、Discrete、MultiDiscrete 和 MultiBinary 动作空间(参见 stable-baselines3 实现的算法)。
The link you posted referred to openai, and not stable-baselines3.您发布的链接指的是 openai,而不是 stable-baselines3。
You should look into other frameworks and check if their algorithm implementations support Tuples / Dictionaries, or otherwise try to implement your own!您应该查看其他框架并检查他们的算法实现是否支持元组/字典,或者尝试实现您自己的!
Otherwise you could try to check if your action spaces with multiple Box-type actions can be easily converted into Discrete-type actions!否则,您可以尝试检查具有多个 Box 类型动作的动作空间是否可以轻松转换为离散类型动作! (which is supported in stable-baselines3 through MultiDiscrete)
(通过 MultiDiscrete 在 stable-baselines3 中支持)
"All of which can have continuous values" “所有这些都可以有连续的值”
Your link is about mixed between integer and continues.您的链接是关于整数之间的混合并继续。 To simply make all continues, you can use
Box
alone.为了简单地让一切继续,您可以单独使用
Box
。
self.action_space = spaces.Box(low=np.array([0,0,-2,0,1]),
high=np.array([1,1,2,1,20]),
dtype=np.float32)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.