简体   繁体   English

循环中的变量更新错误 - Python(Q-learning)

[英]Variable updating wrong in loop - Python (Q-learning)

Why does the position and newposition give the same output and update together in the next loop?为什么 position 和 newposition 给出相同的 output 并在下一个循环中一起更新?

for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position=np.array([0,19])

    status = -1
    # loop over steps taken by the player
    while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)

        # Find out what move to make using  
        q_in=Q[position[0],position[1]]

        
        move, action = action_fcn(q_in,epsilon,wind)
        
        # update location, check grid,reward_list, and status_list 
        
        newposition[0] = position[0] + move[0]
        newposition[1] = position[1] + move[1]
        
        print('new loop')
        print(newposition)
        print(position)
        
        
        grid_state = grid[newposition[0]][newposition[1]]
        reward = reward_list[grid_state]
        
        status = status_list[grid_state]
        status = int(status)
        
        if status == 1:
            Q[position[0],position[1],action]= reward
            break #Game over 
            
        else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
           
        position = newposition

print out:打印出:

new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]

Apparently, somewhere you do not show us, you do显然,在您没有向我们展示的地方,您可以

>>> newposition = position

So actually, when you increment newposition , you actually are doing it over position as well.所以实际上,当你增加newposition时,你实际上也是在position上做的。

So just make newposition be something different than position .因此,只需使newpositionposition不同。 I mean, make them have id(newposition) != id(position) and you will be good.我的意思是,让他们有id(newposition) != id(position)你会很好。 Because currently, I guess that these two id s are the same, aren't they?因为目前,我猜这两个id是相同的,不是吗?

Why does the position and newposition give the same output and update together in the next loop?为什么 position 和 newposition 给出相同的 output 并在下一个循环中一起更新?

Because they are the same object.因为它们是相同的 object。 I am not (only) saying that they are equal, I am saying that newposition is position , ie you currently have (newposition is position) is True .我不是(仅)说它们是平等的,我说的是newpositionposition ,即你目前拥有(newposition is position) is True

Just define newposition independently from position .只需定义newposition独立于position For example:例如:

# [...]
for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position    = np.array([0,19])
    newposition = np.empty((2,))
    # [...]

Also, you may have good reasons to do so, but keep in mind that if move and position have the same shape and convey the "same information", you could also just do此外,您可能有充分的理由这样做,但请记住,如果moveposition具有相同的形状并传达“相同的信息”,您也可以这样做

# [...]
    # [...]
        # [...]
        # newposition[0] = position[0] + move[0]
        # newposition[1] = position[1] + move[1]
        newposition = position + move
        # [...]

and remove newposition = np.empty((2,)) .并删除newposition = np.empty((2,))

that is because you trying to copy one list to another list with = operator;那是因为您尝试使用=运算符将一个列表复制到另一个列表; used with lists it assigns the pointer stored in right variable to the left variable, so physically the point to the same memory cells.与列表一起使用时,它将存储在右变量中的指针分配给左变量,因此物理上指向相同的 memory 单元格。

To copy a list truly, use the list.copy() method.要真正复制列表,请使用list.copy()方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM