简体繁体中英

Does policy gradient algorithm comes under model free or model based methods in Reinforcement learning?

原文 2020-02-14 07:21:47 0 1 reinforcement-learning/ markov-decision-process/ mdp

Reinforcement learning algorithms, which explicitly learn system models and use them to solve MDP problems, are model-based methods. Model-based RL has a strong influence from the control theory and is often explained in terms of different disciplines. These methods include popular algorithms such as the Dyna [Sutton 1991], Q-iteration [Busoniu et al. 2010], Policy Gradient (PG) [Williams 1992] etc.

The model-free methods ignore the model and just focus on figuring out the value functions directly from the interaction with the environment. To accomplish this, the methods depend on sampling and observation heavily; thus they don't need to know the inner working of the system. Some examples of these methods are Q-learning [Krose 1995], SARSA [Rummery and Niranjan 1994], and Actor-Critic [Konda and Tsitsiklis 1999].

Other places it is written policy gradient are model free . Its confusing can someone clear it as actor critic is also a part of policy gradient algorithms ?

1 answers

Policy Gradient algorithms are model-free.

In model-based algorithms, the agent has access to or learns the environment's transition function, F(state, action) = reward, next_state. The transition function here can be either deterministic or stochastic.

In other words, in model-based algorithms, the agent predicts what's going to happen to the environment if a particular action is taken (such as in this paper: Model Based Reinforcement Learning for Atari ). Alternatively, the agent has access to the transition function according to the framing of the problem (For example, in AlphaGo, the agent has access to the transition function of the Go board).

In policy gradient algorithms, the agent has a policy network for predicting what action to take and a value network for predicting the value of the current state. Neither of these networks predicts the environment's transition function. Therefore, it's considered model-free.

You might also find OpenAI Spinning Up's taxonomy diagram helpful.

Generalizing the Policy for Model-based reinforcement learning algorithm with large state and action spaces

Loss Policy Gradient - Reinforcement Learning

Reinforcement Learning with Keras model

Reward function for Policy Gradient Descent in Reinforcement Learning

Debugging Reinforcement Learning Model (MsPacman)

How to implement a reinforcement learning model?

Applying “reinforcement learning” on a supervised learning model

python policy gradient reinforcement learning with continous action space is not working

What is the difference between model and policy w.r.t reinforcement learning

Q-learning vs temporal-difference vs model-based reinforcement learning

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Generalizing the Policy for Model-based reinforcement learning algorithm with large state and action spaces Loss Policy Gradient - Reinforcement Learning Reinforcement Learning with Keras model Reward function for Policy Gradient Descent in Reinforcement Learning Debugging Reinforcement Learning Model (MsPacman) How to implement a reinforcement learning model? Applying “reinforcement learning” on a supervised learning model python policy gradient reinforcement learning with continous action space is not working What is the difference between model and policy w.r.t reinforcement learning Q-learning vs temporal-difference vs model-based reinforcement learning

Related Tags

Does policy gradient algorithm comes under model free or model based methods in Reinforcement learning?

Question

1 answers

solution1 1 2020-02-17 22:40:43

solution1
1 2020-02-17 22:40:43