简体繁体中英

Temporal Difference Learning and Back-propagation

原文 2016-02-14 06:30:38 7 1 machine-learning/ neural-network/ backpropagation/ reinforcement-learning/ temporal-difference

I have read this page of standford - https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html . I am not able to understand how TD learning is used in neural networks. I am trying to make a checkers AI which will use TD learning, similar to what they have implemented in backgammon. Please explain the working of TD Back-Propagation.

I have already referred this question - Neural Network and Temporal Difference Learning But I am not able to understand the accepted answer. Please explain with a different approach if possible.

1 answers

TD learning is not used in neural networks. Instead, neural networks are used in TD learning to store the value (or q-value) function.

I think that you are confusing backpropagation ( a neural networks' concept) with bootstrapping in RL. Bootstrapping uses a combination of recent information and previous estimations to generate new estimations.

When the state-space is large and it is not easy to store the value function in tables, neural networks are used as an approximation scheme to store the value function.

The discussion on forward/backward views is more about eligibility traces, etc. A case where RL bootstraps serval steps ahead in time. However, this is not practical and there are ways (such as eligibility traces) to leave a trail and update past states.

This should not be connected or confused with back propagation in neural networks. It has nothing to do with it.

What is the difference between SGD and back-propagation?

What is the difference between back-propagation and feed-forward Neural Network?

Updates in Temporal Difference Learning

Using a single weight matrix for Back-Propagation in Neural Networks

Neural Network - updating weight matrix - back-propagation algorithm

Back-propagation algorithm converging too quickly to poor results

Implementation of Temporal Difference Learning in Java

Why is it in Pytorch when I make a COPY of a network's weight it would be automatically updated after back-propagation?

RNN: Back-propagation through time when output is taken only at final timestep

How does the back-propagation algorithm deal with non-differentiable activation functions?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the difference between SGD and back-propagation? What is the difference between back-propagation and feed-forward Neural Network? Updates in Temporal Difference Learning Using a single weight matrix for Back-Propagation in Neural Networks Neural Network - updating weight matrix - back-propagation algorithm Back-propagation algorithm converging too quickly to poor results Implementation of Temporal Difference Learning in Java Why is it in Pytorch when I make a COPY of a network's weight it would be automatically updated after back-propagation? RNN: Back-propagation through time when output is taken only at final timestep How does the back-propagation algorithm deal with non-differentiable activation functions?

Related Tags

Temporal Difference Learning and Back-propagation

Question

1 answers

solution1 4 2016-02-26 12:41:39

solution1
4 2016-02-26 12:41:39