简体繁体 English

在强化学习中将离散动作转换为连续动作

[英]Transfer Discrete action to Continuous action in Reinforcement Learning

原文 2018-10-16 14:59:43 4 1 machine-learning/ reinforcement-learning

In reinforcement learning, we empirically know using discrete actions is easier to train than using continuous actions.在强化学习中，我们凭经验知道使用离散动作比使用连续动作更容易训练。

But theoretically, continuous actions is more accurate and fast, just like our human, most of our actions are continuous.但理论上，连续动作更准确、更快速，就像我们人类一样，我们的大部分动作都是连续的。

So is there any method or related research that train a discrete action policy for easier start and then transfer that policy to output continuous actions for better precision?那么是否有任何方法或相关研究可以训练离散动作策略以更容易启动，然后将该策略转移到输出连续动作以获得更好的精度？

Thanks.谢谢。

1 个解决方案

You can certainly do that, any papers that does continuous control using reinforcement learning will do this.你当然可以这样做，任何使用强化学习进行连续控制的论文都会这样做。 The only ones that don't are the researchers that use deep reinforcement learning or reinforcement learning with function approximation.唯一没有使用深度强化学习或函数逼近强化学习的研究人员。 My research is applying both reinforcement learning and deep reinforcement learning on dynamical systems.我的研究是在动态系统上应用强化学习和深度强化学习。 I discretize my state and action space to adequate resolution, and then apply it to control problems.我将我的状态和动作空间离散化到足够的分辨率，然后将其应用于控制问题。

I am currently working on some methods to make the discretized system work for continuous spaces.我目前正在研究一些使离散系统适用于连续空间的方法。 One method is to use linear interpolation.一种方法是使用线性插值。 If your state falls between 2 discretized points, you can use linear interpolation to identify the optimal action (in the continuous space).如果您的状态介于 2 个离散点之间，您可以使用线性插值来确定最佳动作（在连续空间中）。 It works especially well for linear system since the control law is linear as follows:它特别适用于线性系统，因为控制律是线性的，如下所示：

u = Kx u = Kx

And this method is directly in line to what you ask: training on a discrete space, and then applying it to a continuous control problem.这种方法直接符合您的要求：在离散空间上训练，然后将其应用于连续控制问题。

However, traditionally, continuous control problems are solved using either linear function approximation such as tile coding, or non-linear function approximation such as artificial neural networks.然而，传统上，连续控制问题是使用线性函数逼近（如瓦片编码）或非线性函数逼近（如人工神经网络）来解决的。 These methods are more advanced, I would suggest trying to use more basic discrete RL methods first.这些方法更高级，我建议先尝试使用更基本的离散 RL 方法。 I have a RL code on my Github you can use, let me know if you have any issues.我的Github上有一个 RL 代码，您可以使用，如果您有任何问题，请告诉我。