简体   繁体   中英

Does policy gradient algorithm comes under model free or model based methods in Reinforcement learning?

Reinforcement learning algorithms, which explicitly learn system models and use them to solve MDP problems, are model-based methods. Model-based RL has a strong influence from the control theory and is often explained in terms of different disciplines. These methods include popular algorithms such as the Dyna [Sutton 1991], Q-iteration [Busoniu et al. 2010], Policy Gradient (PG) [Williams 1992] etc.

The model-free methods ignore the model and just focus on figuring out the value functions directly from the interaction with the environment. To accomplish this, the methods depend on sampling and observation heavily; thus they don't need to know the inner working of the system. Some examples of these methods are Q-learning [Krose 1995], SARSA [Rummery and Niranjan 1994], and Actor-Critic [Konda and Tsitsiklis 1999].

Other places it is written policy gradient are model free . Its confusing can someone clear it as actor critic is also a part of policy gradient algorithms ?

Policy Gradient algorithms are model-free.

In model-based algorithms, the agent has access to or learns the environment's transition function, F(state, action) = reward, next_state. The transition function here can be either deterministic or stochastic.

In other words, in model-based algorithms, the agent predicts what's going to happen to the environment if a particular action is taken (such as in this paper: Model Based Reinforcement Learning for Atari ). Alternatively, the agent has access to the transition function according to the framing of the problem (For example, in AlphaGo, the agent has access to the transition function of the Go board).

In policy gradient algorithms, the agent has a policy network for predicting what action to take and a value network for predicting the value of the current state. Neither of these networks predicts the environment's transition function. Therefore, it's considered model-free.

You might also find OpenAI Spinning Up's taxonomy diagram helpful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM