简体   繁体   中英

Gridworld from Sutton's RL book: how to calculate value function for corner cells?

Referring to the RL book by Sutton and Barto, 2nd ed., Ch-3, pg-60.

Here is the 5x5 grid world and the value of each state: gridoworld with state values

Using the Bellman Backup equation, the value of each state can be calculated:

Here is the calculation for the middle (3,3) cell:

calculation of state value

Using the values from the upper, lower, left and right cells, along with a random policy with pi = 1/4 and all the transition probabilities p(s',r|s,a) = 1 , the calculation holds.

But what about the corner cells?

Say, 3.3 at the top left. How to calculate that?

Using the lower (1.5) and right (8.8) values only doesn't work. Also, it must be considered that when the agent performs the upper and left actions, it remains on the grid but receives a reward of -1.

Can you please help me calculate the corner cell values? Reading the github implementations isn't helping either.

the value of the upper left corner would be (0.9*(8.8+1.5) + (-1+0.9*3.3)*2) /4 approx. equals to 3.3025.

1: 0.9*(8.8+1.5) because gamma 0.9, r = 0 if agent is not off grid and not transitioning from special states A or B, and v(s') is 8.8 and 1.5 for a left and down move respectively.

2: (-1+0.9*3.3)*2 because r = -1 if agent steps off grid (AKA left or up move), 0.9 because that's gamma, 3.3 because v(s') = v(s) as agent remains in its previous state if it steps off grid. Times 2 because there are 2 possibilities(left/up move) to for agent to step off grid.

3: div sum of part 1 and 2 by 1/4 because pi(a|s) = 1/4 for all actions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM