简体繁体中英

Reinforcement Learning With Variable Actions

原文 2011-03-07 04:34:08 3 3 machine-learning/ reinforcement-learning/ planning

All the reinforcement learning algorithms I've read about are usually applied to a single agent that has a fixed number of actions. Are there any reinforcement learning algorithms for making a decision while taking into account a variable number of actions? For example, how would you apply a RL algorithm in a computer game where a player controls N soldiers, and each soldier has a random number of actions based its condition? You can't formulate fixed number of actions for a global decision maker (ie "the general") because the available actions are continually changing as soldiers are created and killed. And you can't formulate a fixed number of actions at the soldier level, since the soldier's actions are conditional based on its immediate environment. If a soldier sees no opponents, then it might only be able to walk, whereas if it sees 10 opponents, then it has 10 new possible actions, attacking 1 of the 10 opponents.

3 answers

What you describe is nothing unusual. Reinforcement learning is a way of finding the value function of a Markov Decision Process . In an MDP, every state has its own set of actions. To proceed with reinforcement learning application, you have to clearly define what the states, actions, and rewards are in your problem.

If you have a number of actions for each soldier that are available or not depending on some conditions, then you can still model this as selection from a fixed set of actions. For example:

Create a "utility value" for each of the full set of actions for each soldier
Choose the highest valued action, ignoring those actions that are not available at a given time

If you have multiple possible targets, then the same principle applies, except this time you model your utility function to take the target designation as an additional parameter, and run the evaluation function multiple times (one for each target). You pick the target that has the highest "attack utility".

在连续域动作空间中，策略神经网络通常输出均值和/或方差，然后您从中对动作进行采样，假设它遵循某个分布。

Reinforcement learning algorithms for continuous states, discrete actions

Multiple actions that lead to the same state in Reinforcement Learning

Reinforcement Learning

What is Reinforcement machine learning?

Negative reward in reinforcement learning

Reinforcement learning with neural networks

Reinforcement learning for power management

Reinforcement Learning on a Supervised Dataset

SARSA in Reinforcement Learning

Implementations of Hierarchical Reinforcement Learning

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Reinforcement learning algorithms for continuous states, discrete actions Multiple actions that lead to the same state in Reinforcement Learning Reinforcement Learning What is Reinforcement machine learning? Negative reward in reinforcement learning Reinforcement learning with neural networks Reinforcement learning for power management Reinforcement Learning on a Supervised Dataset SARSA in Reinforcement Learning Implementations of Hierarchical Reinforcement Learning

Related Tags

Reinforcement Learning With Variable Actions

Question

3 answers

solution1
4 ACCPTED 2011-07-28 21:46:12

solution2
1 2011-03-07 11:15:27

solution3
0 2020-05-07 07:00:33

Reinforcement Learning With Variable Actions

Question

3 answers

solution1 4 ACCPTED 2011-07-28 21:46:12

solution2 1 2011-03-07 11:15:27

solution3 0 2020-05-07 07:00:33

solution1
4 ACCPTED 2011-07-28 21:46:12

solution2
1 2011-03-07 11:15:27

solution3
0 2020-05-07 07:00:33