简体   繁体   中英

Temporal Difference Learning and Back-propagation

I have read this page of standford - https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html . I am not able to understand how TD learning is used in neural networks. I am trying to make a checkers AI which will use TD learning, similar to what they have implemented in backgammon. Please explain the working of TD Back-Propagation.

I have already referred this question - Neural Network and Temporal Difference Learning But I am not able to understand the accepted answer. Please explain with a different approach if possible.

TD learning is not used in neural networks. Instead, neural networks are used in TD learning to store the value (or q-value) function.

I think that you are confusing backpropagation ( a neural networks' concept) with bootstrapping in RL. Bootstrapping uses a combination of recent information and previous estimations to generate new estimations.

When the state-space is large and it is not easy to store the value function in tables, neural networks are used as an approximation scheme to store the value function.

The discussion on forward/backward views is more about eligibility traces, etc. A case where RL bootstraps serval steps ahead in time. However, this is not practical and there are ways (such as eligibility traces) to leave a trail and update past states.

This should not be connected or confused with back propagation in neural networks. It has nothing to do with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM