简体   繁体   English

如何在健身房环境的动作空间中使用连续值?

[英]How to use continuous values in the action space of a gym environment?

I am trying to make a custom gym environment with five actions, all of which can have continuous values.我正在尝试使用五个动作制作一个自定义的健身房环境,所有这些动作都可以具有连续值。 To implement the same, I have used the following action_space format:为了实现相同的功能,我使用了以下action_space格式:

self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))

However, when I try to run a PPO model(from stable_baselines3 ), I get the following error:但是,当我尝试运行 PPO 模型(来自stable_baselines3 )时,出现以下错误:

AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided

I searched for a bit about this issue and I found this on Github:我搜索了一些关于这个问题的信息,并在 Github 上找到了这个:

Link According to this I changed my code in the following way:链接据此,我通过以下方式更改了我的代码:

self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           "topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           "presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}

But this still returned the same error.但这仍然返回相同的错误。

Also, I found this answer: Link另外,我找到了这个答案: 链接

According to this, my code should work as I am using the Tuple space too.据此,我的代码应该可以工作,因为我也在使用元组空间。

How do I convert this to an accepted data type for the action_space?如何将其转换为 action_space 的可接受数据类型?

Unfortunately most of the stable-baselines3 implementation only support Box, Discrete, MultiDiscrete and MultiBinary action spaces (see stable-baselines3 Implemented Algorithms ).不幸的是,大多数 stable-baselines3 实现只支持 Box、Discrete、MultiDiscrete 和 MultiBinary 动作空间(参见 stable-baselines3 实现的算法)。

The link you posted referred to openai, and not stable-baselines3.您发布的链接指的是 openai,而不是 stable-baselines3。

You should look into other frameworks and check if their algorithm implementations support Tuples / Dictionaries, or otherwise try to implement your own!您应该查看其他框架并检查他们的算法实现是否支持元组/字典,或者尝试实现您自己的!

Otherwise you could try to check if your action spaces with multiple Box-type actions can be easily converted into Discrete-type actions!否则,您可以尝试检查具有多个 Box 类型动作的动作空间是否可以轻松转换为离散类型动作! (which is supported in stable-baselines3 through MultiDiscrete) (通过 MultiDiscrete 在 stable-baselines3 中支持)

"All of which can have continuous values" “所有这些都可以有连续的值”

Your link is about mixed between integer and continues.您的链接是关于整数之间的混合并继续。 To simply make all continues, you can use Box alone.为了简单地让一切继续,您可以单独使用Box

self.action_space = spaces.Box(low=np.array([0,0,-2,0,1]),
                               high=np.array([1,1,2,1,20]),
                               dtype=np.float32)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在每回合接收 3 个缩放器和一个矩阵的自定义健身房环境中定义动作空间? - How to define action space in custom gym environment that receives 3 scalers and a matrix each turn? 如何为我使用 Gym.Scpaces.Box 创建的自定义 Gym 环境正确定义此观察空间? - How to correctly define this Observation Space for the custom Gym environment I am creating using Gym.Scpaces.Box? 如何在 OpenAI Gym 中传递元组列表作为动作空间? - How do I pass a list of tuples as the action space in OpenAI Gym? 如何复制健身房环境? - How to copy gym environment? openai-gym 如何确定观察空间中的值的含义 - openai-gym how to determine what the values in observation space mean 您如何在 Ray Tune 中的自定义 Gym 环境中使用 OpenAI Gym“包装器”? - How do you use OpenAI Gym 'wrappers' with a custom Gym environment in Ray Tune? 如何在 google colab 上创建和使用自定义 OpenAI 健身房环境? - How to create and use a custom OpenAI gym environment on google colab? Openai Gym Box 动作空间不限制动作 - Openai Gym Box action space not bounding actions 有没有一种方法可以定义一个 Gym 动作空间,其中 N 个值必须总和为一个常数? - Is there a way to define a Gym action space where N values must sum to a constant? 如何撤消 OpenAI Gym 中的操作? - How to undo action in OpenAI Gym?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM