简体繁体中英

Why the bandit problem is also called a one-step/state MDP in Reinforcement learning?

原文 2020-02-11 08:12:13 5 2 machine-learning/ reinforcement-learning/ markov-decision-process/ mdp/ bandit

1 步/状态 MDP（马尔可夫决策过程）是什么意思？

2 answers

Let us consider a n action 1 state MDP. Regardless of which action you take, you are going to stay in the same state. You will, though, get a reward that depends only on the action you took. If you wish to maximise the long term reward in this setting, what you need to do is just judge which of n available choices (actions) is the best.

This is exactly what the bandit problem is.

In bandit the past pulls of levers do not affect what the lever will output or the reward.

The reward is only dependent on which lever is pulled and nothing in the past.

So there is only one state.

Why do we need MDP setting in reinforcement learning

Why is RL called 'reinforcement' learning?

Confusion in understanding Q(s,a) formula for Reinforcement Learning MDP?

MDP & Reinforcement Learning - Convergence Comparison of VI, PI and QLearning Algorithms

What is it called when the action doesnt affect the state in reinforcement learning?

Stochastic state transitions in MDP: How does Q-learning estimate that?

Reinforcement Learning in arbitrarily large action/state spaces

DQN(Reinforcement learning) : should state be standardized?

In Reinforcement learning , do both agent and environment have different states or there is only one state?

Reinforcement Learning where every state is terminal

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why do we need MDP setting in reinforcement learning Why is RL called 'reinforcement' learning? Confusion in understanding Q(s,a) formula for Reinforcement Learning MDP? MDP & Reinforcement Learning - Convergence Comparison of VI, PI and QLearning Algorithms What is it called when the action doesnt affect the state in reinforcement learning? Stochastic state transitions in MDP: How does Q-learning estimate that? Reinforcement Learning in arbitrarily large action/state spaces DQN(Reinforcement learning) : should state be standardized? In Reinforcement learning , do both agent and environment have different states or there is only one state? Reinforcement Learning where every state is terminal

Related Tags

Why the bandit problem is also called a one-step/state MDP in Reinforcement learning?

Question

2 answers

solution1
2 ACCPTED 2020-02-11 20:08:14

solution2
1 2020-02-11 14:20:45

Why the bandit problem is also called a one-step/state MDP in Reinforcement learning?

Question

2 answers

solution1 2 ACCPTED 2020-02-11 20:08:14

solution2 1 2020-02-11 14:20:45

solution1
2 ACCPTED 2020-02-11 20:08:14

solution2
1 2020-02-11 14:20:45