简体   繁体   中英

Variable updating wrong in loop - Python (Q-learning)

Why does the position and newposition give the same output and update together in the next loop?

for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position=np.array([0,19])

    status = -1
    # loop over steps taken by the player
    while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)

        # Find out what move to make using  
        q_in=Q[position[0],position[1]]

        
        move, action = action_fcn(q_in,epsilon,wind)
        
        # update location, check grid,reward_list, and status_list 
        
        newposition[0] = position[0] + move[0]
        newposition[1] = position[1] + move[1]
        
        print('new loop')
        print(newposition)
        print(position)
        
        
        grid_state = grid[newposition[0]][newposition[1]]
        reward = reward_list[grid_state]
        
        status = status_list[grid_state]
        status = int(status)
        
        if status == 1:
            Q[position[0],position[1],action]= reward
            break #Game over 
            
        else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
           
        position = newposition

print out:

new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]

Apparently, somewhere you do not show us, you do

>>> newposition = position

So actually, when you increment newposition , you actually are doing it over position as well.

So just make newposition be something different than position . I mean, make them have id(newposition) != id(position) and you will be good. Because currently, I guess that these two id s are the same, aren't they?

Why does the position and newposition give the same output and update together in the next loop?

Because they are the same object. I am not (only) saying that they are equal, I am saying that newposition is position , ie you currently have (newposition is position) is True .

Just define newposition independently from position . For example:

# [...]
for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position    = np.array([0,19])
    newposition = np.empty((2,))
    # [...]

Also, you may have good reasons to do so, but keep in mind that if move and position have the same shape and convey the "same information", you could also just do

# [...]
    # [...]
        # [...]
        # newposition[0] = position[0] + move[0]
        # newposition[1] = position[1] + move[1]
        newposition = position + move
        # [...]

and remove newposition = np.empty((2,)) .

that is because you trying to copy one list to another list with = operator; used with lists it assigns the pointer stored in right variable to the left variable, so physically the point to the same memory cells.

To copy a list truly, use the list.copy() method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM