简体繁体中英

Gradient Temporal Difference Lambda without Function Approximation

原文 2016-04-30 15:43:50 9 1 machine-learning/ reinforcement-learning/ temporal-difference

In every formalism of GTD(λ) seems to define it in terms of function approximation, using θ and some weight vector w.

I understand that the need for gradient methods widely came from their convergence properties for linear function approximators, but I would like to make use of GTD for the importance sampling.

Is it possible to take advantage of GTD without function approximation? If so, how are the update equations formalized?

1 answers

I understand that when you say "without function approximation" you mean representing the value function V as a table. In that case, the tabular representation of V can also be seen as a function approximator.

For example, if we define the approximated value function as:

Then, using a tabular representation, there are as many features as states, and the feature vector for a given state s is zero for all states except s (that it's equal to one), and the parameter vector theta stores the value for each state. Therefore, GTD, as well as others algorithms, can be used without any modification in a tabular way.

Update Rule in Temporal difference

Updates in Temporal Difference Learning

Implementation of Temporal Difference Learning in Java

Function approximation using Autoencoder in MATLAB

Deep Belief Network for Function Approximation

ANN regression, linear function approximation

What is the difference between gradient descent and cost function J(theta)?

Temporal Difference Learning and Back-propagation

Finding approximation function which depend on 8-parameters

How does keras(or any other ML framework) calculate the gradient of a lambda function layer for backpropagation?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Update Rule in Temporal difference Updates in Temporal Difference Learning Implementation of Temporal Difference Learning in Java Function approximation using Autoencoder in MATLAB Deep Belief Network for Function Approximation ANN regression, linear function approximation What is the difference between gradient descent and cost function J(theta)? Temporal Difference Learning and Back-propagation Finding approximation function which depend on 8-parameters How does keras(or any other ML framework) calculate the gradient of a lambda function layer for backpropagation?

Related Tags

Gradient Temporal Difference Lambda without Function Approximation

Question

1 answers

solution1 2 ACCPTED 2016-05-04 11:49:20

solution1
2 ACCPTED 2016-05-04 11:49:20