简体繁体中英

Several dips in accumulated episodic rewards during training of a reinforcement learning agent

原文 2019-11-25 13:01:14 6 1 artificial-intelligence/ reinforcement-learning/ agent/ temporal-difference/ dqn

Hi I am training reinforcement learning agents for a control problem using PPO algorithm. I am tracking the accumulated rewards for each episode during the training process. Several times during the training process I see a sudden dip in the accumulated rewards. I am not able to figure out why this is happening or how to avoid this. Tried with changing some of the hyper parameters like changing the number of neurons in the neural network layers, learning rate etc.. but still I see this happening consistently. If I debug and check the actions that are being taken during dips, obviously actions are very bad hence causing a decrease in rewards.

Can some one help me with understanding why this is happening or how to avoid this ?

Some of plots of my training process

1 answers

I recently read this paper: https://arxiv.org/pdf/1805.07917.pdf I haven't used this method in particular, so I can't really vouch for the usefulness, but the explanation to this problem seemed convincing to me:

For instance, during the course of learning, the cheetah benefits from leaning forward to increase its speed which gives rise to a strong gradient in this direction. However, if the cheetah leans too much, it falls over. The gradient-based methods seem to often fall into this trap and then fail to recover as the gradient information from the new state has no guarantees of undoing the last gradient update.

tflearn loss is always 0.0 while training reinforcement learning agent

Reinforcement learning actor predicting same actions during initial training

Deep Reinforcement Learning Training Accuracy

How can i save a trained reinforcement learning agent to avoid training it each time?

Why do we weight recent rewards higher in non-stationary reinforcement learning?

Deep Reinforcement Learning, how to make an agent that control many machines

deep reinforcement learning parameters and training time for a simple game

How do you evaluate a trained reinforcement learning agent whether it is trained or not?

Switching AI learning technique during training?

Reinforcement Learning - How to we decide the reward to the agent when the input to the game is only pixels?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question tflearn loss is always 0.0 while training reinforcement learning agent Reinforcement learning actor predicting same actions during initial training Deep Reinforcement Learning Training Accuracy How can i save a trained reinforcement learning agent to avoid training it each time? Why do we weight recent rewards higher in non-stationary reinforcement learning? Deep Reinforcement Learning, how to make an agent that control many machines deep reinforcement learning parameters and training time for a simple game How do you evaluate a trained reinforcement learning agent whether it is trained or not? Switching AI learning technique during training? Reinforcement Learning - How to we decide the reward to the agent when the input to the game is only pixels?

Related Tags

Several dips in accumulated episodic rewards during training of a reinforcement learning agent

Question

1 answers

solution1 0 2019-11-25 15:23:44

solution1
0 2019-11-25 15:23:44