简体繁体中英

Is Monte Carlo learning policy or value iteration (or something else)?

原文 2018-05-07 18:28:49 0 1 reinforcement-learning/ q-learning/ temporal-difference/ monte-carlo-tree-search/ value-iteration

I am taking a Reinforcement Learning class and I didn't understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it be binary yes/no, some string description or is it more complicated?

1 answers

Value iteration and policy iteration are model-based methods of finding an optimal policy. They try to construct the Markov decision process (MDP) of the environment. The main premise behind reinforcement learning is that you don't need the MDP of an environment to find an optimal policy, and traditionally value iteration and policy iteration are not considered RL (although understanding them is key to RL concepts). Value iteration and policy iteration learn "indirectly" because they form a model of the environment and can then extract the optimal policy from that model.

"Direct" learning methods do not attempt to construct a model of the environment. They might search for an optimal policy in the policy space or utilize value function-based (aka "value based") learning methods. Most approaches you'll learn about these days tend to be value function-based.

Within value function-based methods, there are two primary types of reinforcement learning methods:

Policy iteration-based methods
Value iteration-based methods

Your homework is asking you, for each of those RL methods, if they are based on policy iteration or value iteration.

A hint: one of those five RL methods is not like the others.

Monte Carlo policy evaluation confusion

When to use Monte Carlo over TD learning, and vice-versa

Simple example of reinforce algorithm (monte-carlo policy gradient)

Can I combine Monte Carlo policy gradient algorithm with other policy gradient algorithms

Policy Iteration vs Value Iteration

Understanding monte carlo tree search

Fitted value iteration algorithm of Markov Reinforcement Learning

Base cases for value iteration in reinforcement learning

Understanding policy and value functions reinforcement learning

why are policy-iteration and value-iteration methods giving different results for optimal values and optimal policy?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Monte Carlo policy evaluation confusion When to use Monte Carlo over TD learning, and vice-versa Simple example of reinforce algorithm (monte-carlo policy gradient) Can I combine Monte Carlo policy gradient algorithm with other policy gradient algorithms Policy Iteration vs Value Iteration Understanding monte carlo tree search Fitted value iteration algorithm of Markov Reinforcement Learning Base cases for value iteration in reinforcement learning Understanding policy and value functions reinforcement learning why are policy-iteration and value-iteration methods giving different results for optimal values and optimal policy?

Related Tags

Is Monte Carlo learning policy or value iteration (or something else)?

Question

1 answers

solution1 2 2018-05-10 14:41:34

solution1
2 2018-05-10 14:41:34