简体繁体中英

Policy Gradient Action Dimension

原文 2019-10-03 09:01:30 6 1 machine-learning/ neural-network/ artificial-intelligence/ gradient/ reinforcement-learning

I understand that action space in Policy Gradient should be discrete, like "up", "left", "Do nothing".

My environment is an agent need to choose a direction (360 degree), and then choose the number of step (10 step).

Under this environment there will be 3600 different actions in the action space can the agent choose, it will require a lot of episode to train the agent and a bit waste of resources.

Can you advise me how to tackle with such case?

Can the action space be transformed to continuous random variable?

1 answers

I think with policy gradients you do not have to make use of discrete actions, but you can use continuous variables. Only DQN (Deep Q Networks/Deep Q Learning) requires discrete actions. (Because there you have to choose from one of the action-possibilities.

Continuous Variables in your case could be: Network Output 1 (Value from 0 to 1) times 360 = angle Network Output 2 (Value from 0 to 1) times 10, casted as integer = Number of steps

Understanding Gradient Policy Deriving

What is the difference between policy gradient methods and neural network-based action-value methods?

Automatic differentiation in policy gradient networks

Policy gradient methods for Open AI Gym Cartpole

Tensorflow custom gradient for custom op with more than one dimension

Simple example of reinforce algorithm (monte-carlo policy gradient)

Reinforcement learning: Is Actor-Critic alwayse better than Policy gradient method?

Generalizing the Policy for Model-based reinforcement learning algorithm with large state and action spaces

Simple policy gradients (REINFORCE) overfits one action when playing Atari Breakout

Why the dimension of output LSTM layer can be either 2 dimension or 3 dimension?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Understanding Gradient Policy Deriving What is the difference between policy gradient methods and neural network-based action-value methods? Automatic differentiation in policy gradient networks Policy gradient methods for Open AI Gym Cartpole Tensorflow custom gradient for custom op with more than one dimension Simple example of reinforce algorithm (monte-carlo policy gradient) Reinforcement learning: Is Actor-Critic alwayse better than Policy gradient method? Generalizing the Policy for Model-based reinforcement learning algorithm with large state and action spaces Simple policy gradients (REINFORCE) overfits one action when playing Atari Breakout Why the dimension of output LSTM layer can be either 2 dimension or 3 dimension?

Related Tags

Policy Gradient Action Dimension

Question

1 answers

solution1 0 2020-05-21 16:02:20

solution1
0 2020-05-21 16:02:20