简体繁体 English

如何获得 integer 作为 output 用于连续动作空间 PPO 强化学习？

[英]How can I get an integer as output for continuous action space PPO reinforcement learning?

原文 2022-08-03 08:50:53 7 1 deep-learning/ reinforcement-learning

I have a huge discrete action space, the learning stability is not good.我有一个巨大的离散动作空间，学习稳定性不好。 I'd like to move to continuous action space but the only output for my task can be a positive integer (let's say in the range 0 to 999).我想移动到连续动作空间，但我的任务唯一的 output 可以是正的 integer（假设在 0 到 999 的范围内）。 How can I force the DNN to output a positive integer?如何强制 DNN 到 output 为正 integer？

1 个解决方案

Could you please specify which framework for RL agents you are using and which kind of environment?您能否具体说明您正在使用哪种 RL 代理框架以及哪种环境？

Assuming that you are using Stable Baselines 3 (SB3) with OpenAI gym environment, you should be able to set the desired action space when you set up the environment (see here: https://www.gymlibrary.ml/content/spaces/ ).假设您在 OpenAI 健身房环境中使用 Stable Baselines 3 (SB3)，您应该能够在设置环境时设置所需的动作空间（参见此处： https://www.gymlibrary.ml/content/spaces/ ）。 PPO is one of SB3's most versatile agents that can be used for "Discrete", "Box", "MultiDiscrete" and "MultiBinary" action spaces. PPO 是 SB3 最通用的代理之一，可用于“离散”、“盒子”、“多离散”和“多二进制”动作空间。

Lastly to get an output of a positive integer only, can be achieved in several ways:最后要得到一个正 integer 的 output ，可以通过以下几种方式实现：

if your output is for example Box(low=-1.0, high=1.0, shape=(1,) it would be just about scaling that output to the desired range and converting it to an int.如果您的 output 是例如Box(low=-1.0, high=1.0, shape=(1,)它只是将 output 缩放到所需范围并将其转换为整数。
another option is to define your output as Box(low=-1.0, high=1.0, shape=(N_DISCRETE_ACTIONS,) and then get the final integer with a np.argmax(action)另一种选择是将您的 output 定义为Box(low=-1.0, high=1.0, shape=(N_DISCRETE_ACTIONS,)然后使用np.argmax(action)获得最终的 integer
the direct way would be to define the action space as a Discrete(N_DISCRETE_ACTIONS)直接的方法是将动作空间定义为Discrete(N_DISCRETE_ACTIONS)