简体繁体中英

How do we assess each reward in the return in Policy Gradient Methods?

原文 2019-06-10 13:25:10 1 1 reinforcement-learning/ policy-gradient-descent

Hi StackOverflow Community,

I have a problem with the policy gradient methods in reinforcement learning.

In policy gradient methods, we increase/decrease the log probability of an action based on the return (ie total rewards) from that step onwards. So if our return is high, we increase it but I have problem at this step.

Let say that we have three rewards in our return. Although the sum of all these three rewards is high, the second reward is really bad.

How do we deal with this problem? How do we assess each reward separately? Is there an alternative version of this policy gradient methods?

1 answers

This is a multi-objective problem , where the reward is not scalar but a vector. By definition, there is no single optimal policy in the classical sense, but there is a set of Pareto-optimal policies, ie, for which you cannot perform better wrt an objective (max sum of first reward, for instance) without losing something on the other objective (max sum of other rewards). There are many ways to approach multi-objective problems, both in optimization (often genetic algorithms ) and in RL. Naively, you could just apply a scalarization to the rewards by linear weighting, but that's really inefficient. More sophisticated approaches learn a manifold in policy parameters space (eg this ).

Reward function for Policy Gradient Descent in Reinforcement Learning

Reinforcement Learning Policy Gradient two different update method with reward?

Policy gradient methods for Open AI Gym Cartpole

How do we derive our loss function from the gradient objective?

How to solve the zero probability problem in the policy gradient?

Tensorflow - How to compute loss with policy gradient

How to make softmax work with policy gradient?

What is the difference between policy gradient methods and neural network-based action-value methods?

What is importance of reward policy in Reinforcement learninig?

Policy Gradient Action Dimension

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Reward function for Policy Gradient Descent in Reinforcement Learning Reinforcement Learning Policy Gradient two different update method with reward? Policy gradient methods for Open AI Gym Cartpole How do we derive our loss function from the gradient objective? How to solve the zero probability problem in the policy gradient? Tensorflow - How to compute loss with policy gradient How to make softmax work with policy gradient? What is the difference between policy gradient methods and neural network-based action-value methods? What is importance of reward policy in Reinforcement learninig? Policy Gradient Action Dimension

Related Tags

How do we assess each reward in the return in Policy Gradient Methods?

Question

1 answers

solution1 0 2019-06-11 14:28:33

solution1
0 2019-06-11 14:28:33