简体   繁体   中英

Data structure for Markov Decision Process

I have implemented the value iteration algorithm for simple Markov decision process Wikipedia in Python. In order to keep the structure (states, actions, transitions, rewards) of the particular Markov process and iterate over it I have used the following data structures:

  1. dictionary for states and actions that are available for those states:

    SA = { 'state A': {' action 1', 'action 2', ..}, ...}

  2. dictionary for transition probabilities:

    T = {('state A', 'action 1'): {'state B': probability}, ...}

  3. dictionary for rewards:

    R = {('state A', 'action 1'): {'state B': reward}, ...} .

My question is: is this the right approach? What are the most suitable data structures (in Python) for MDP?

I implemented Markov Decision Processes in Python before and found the following code useful.

http://aima.cs.berkeley.edu/python/mdp.html

This code is taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

Whether a data structure is suitable or not mostly depends on what you do with the data. You mention that you want to iterate over the process, so optimize your data structure for this purpose.

Transitions in Markov processes are often modeled by matrix multiplications. The transition probabilities Pa(s1,s2) and the rewards Ra(s1,s2) could be described by (potentially sparse) matrices Pa and Ra indexed by the states. I think this would have a few advantages:

  • If you use numpy arrays for this, indexing will probably be faster than with the dictionaries.
  • Also state transitions could then be simply described by matrix multiplication.
  • Process simulation with for example roulette wheel selection will be faster and more clearly implemented, since you simply need to pick the corresponding column of the transition matrix.

There is an implementation of MDP with python called pymdptoolbox . It is developed based on the implementation with Matlab called MDPToolbox . Both are worth noting.

Basically, the probability transition matrix is represented as an ( A × S × S ) array , and rewards as an ( S × A ) matrix, where S and A represent number of states and number of actions. The package has some special treatment for sparse matrix as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM