简体   繁体   中英

Policy Gradient Action Dimension

I understand that action space in Policy Gradient should be discrete, like "up", "left", "Do nothing".

My environment is an agent need to choose a direction (360 degree), and then choose the number of step (10 step).

Under this environment there will be 3600 different actions in the action space can the agent choose, it will require a lot of episode to train the agent and a bit waste of resources.

Can you advise me how to tackle with such case?

Can the action space be transformed to continuous random variable?

I think with policy gradients you do not have to make use of discrete actions, but you can use continuous variables. Only DQN (Deep Q Networks/Deep Q Learning) requires discrete actions. (Because there you have to choose from one of the action-possibilities.

Continuous Variables in your case could be: Network Output 1 (Value from 0 to 1) times 360 = angle Network Output 2 (Value from 0 to 1) times 10, casted as integer = Number of steps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM