简体繁体中英

Convergence of value iteration

原文 2013-11-11 01:16:01 1 1 algorithm/ artificial-intelligence/ iteration/ markov-chains/ convergence

Why the termination condition of value-iteration algorithm ( example http://aima-java.googlecode.com/svn/trunk/aima-core/src/main/java/aima/core/probability/mdp/search/ValueIteration.java )

In the MDP (Markov Desicion Process) is

||Ui+1-Ui||< error*(1-gamma)/gamma, where

Ui is vector of utilities
Ui+1 updated vector of utilities

error -error bound used in algorithm

gamma-discount factor used in algorithm

Where does "error*(1-gamma)/gamma" come from? "divided by gamma" is because every step is discounted by gamma? But error*(1-gamma)? And how big must be an error?

1 answers

That's called a Bellman Error or a Bellman Residual.

See Williams and Baird , 1993 for use in MDPs.

See Littman , 1994 for use in POMDPs.

How to implement this iteration/convergence step by guessing a value in Matlab?

How to get value of next iteration in current iteration

Fitted value iteration algorithm of Markov Reinforcement Learning

Dynamic Programming of Markov Decision Process with Value Iteration

Understanding The Value Iteration Algorithm of Markov Decision Processes

Genetic Algorithm - convergence

Unconstrained optimization methods and their convergence

Genetic algorithm: problem of convergence

Geometric proof of Convergence of Perceptron Algorithm

Big O of an algorithm that relies on convergence

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to implement this iteration/convergence step by guessing a value in Matlab? How to get value of next iteration in current iteration Fitted value iteration algorithm of Markov Reinforcement Learning Dynamic Programming of Markov Decision Process with Value Iteration Understanding The Value Iteration Algorithm of Markov Decision Processes Genetic Algorithm - convergence Unconstrained optimization methods and their convergence Genetic algorithm: problem of convergence Geometric proof of Convergence of Perceptron Algorithm Big O of an algorithm that relies on convergence

Related Tags

Convergence of value iteration

Question

1 answers

solution1 0 ACCPTED 2013-11-11 06:35:48

solution1
0 ACCPTED 2013-11-11 06:35:48