简体繁体中英

Reinforcement learning algorithms for continuous states, discrete actions

原文 2014-11-19 03:04:25 2 1 machine-learning/ reinforcement-learning

I'm trying to find optimal policy in environment with continuous states (dim. = 20) and discrete actions (3 possible actions). And there is a specific moment: for optimal policy one action (call it "action 0") should be chosen much more frequently than other two (~100 times more often; this two action more risky).

I've tried Q-learning with NN value-function approximation. Results were rather bad: NN learns always choose "action 0". I think that policy gradient methods (on NN weights) may help, but don't understand how to use them on discrete actions.

Could you give some advise what to try? (maybe algorithms, papers to read). What are the state-of-the-art RL algorithms when state space is continuous and action space is discrete?

Thanks.

1 answers

Applying Q-learning in continuous (states and/or actions) spaces is not a trivial task. This is especially true when trying to combine Q-learning with a global function approximator such as a NN (I understand that you refer to the common multilayer perceptron and the backpropagation algorithm). You can read more in the Rich Sutton's page . A better (or at least more easy) solution is to use local approximators such as for example Radial Basis Function networks (there is a good explanation of why in Section 4.1 of this paper ).

On the other hand, the dimensionality of your state space maybe is too high to use local approximators. Thus, my recommendation is to use other algorithms instead of Q-learning. A very competitive algorithm for continuous states and discrete actions is Fitted Q Iteration , which usually is combined with tree methods to approximate the Q-function.

Finally, a common practice when the number of actions is low, as in your case, it is to use an independent approximator for each action, ie, instead of a unique approximator that takes as input the state-action pair and return a Q value, using three approximators, one per action, that take as input only the state. You can find an example of this in Example 3.1 of the book Reinforcement Learning and Dynamic Programming Using Function Approximators

Transfer Discrete action to Continuous action in Reinforcement Learning

Reinforcement Learning With Variable Actions

Reinforcement learning methodes that map continuous to continuous

Defining states, Q and R matrix in reinforcement learning

Reinforcement Learning - How to get out of 'sticky' states?

Reinforcement learning for continuous state and action space

Multiple actions that lead to the same state in Reinforcement Learning

MDP & Reinforcement Learning - Convergence Comparison of VI, PI and QLearning Algorithms

How can I apply reinforcement learning to continuous action spaces?

Reinforcement Learning

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Transfer Discrete action to Continuous action in Reinforcement Learning Reinforcement Learning With Variable Actions Reinforcement learning methodes that map continuous to continuous Defining states, Q and R matrix in reinforcement learning Reinforcement Learning - How to get out of 'sticky' states? Reinforcement learning for continuous state and action space Multiple actions that lead to the same state in Reinforcement Learning MDP & Reinforcement Learning - Convergence Comparison of VI, PI and QLearning Algorithms How can I apply reinforcement learning to continuous action spaces? Reinforcement Learning

Related Tags

Reinforcement learning algorithms for continuous states, discrete actions

Question

1 answers

solution1 9 ACCPTED 2014-11-19 09:54:08

solution1
9 ACCPTED 2014-11-19 09:54:08