循环中的变量更新错误 - Python（Q-learning）

Question

为什么 position 和 newposition 给出相同的 output 并在下一个循环中一起更新？

for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position=np.array([0,19])

    status = -1
    # loop over steps taken by the player
    while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)

        # Find out what move to make using  
        q_in=Q[position[0],position[1]]

        
        move, action = action_fcn(q_in,epsilon,wind)
        
        # update location, check grid,reward_list, and status_list 
        
        newposition[0] = position[0] + move[0]
        newposition[1] = position[1] + move[1]
        
        print('new loop')
        print(newposition)
        print(position)
        
        
        grid_state = grid[newposition[0]][newposition[1]]
        reward = reward_list[grid_state]
        
        status = status_list[grid_state]
        status = int(status)
        
        if status == 1:
            Q[position[0],position[1],action]= reward
            break #Game over 
            
        else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
           
        position = newposition

打印出：

new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]

Answer 1

显然，在您没有向我们展示的地方，您可以

>>> newposition = position

所以实际上，当你增加newposition时，你实际上也是在position上做的。

因此，只需使newposition与position不同。 我的意思是，让他们有id(newposition) != id(position)你会很好。 因为目前，我猜这两个id是相同的，不是吗？

为什么 position 和 newposition 给出相同的 output 并在下一个循环中一起更新？

因为它们是相同的 object。 我不是（仅）说它们是平等的，我说的是newposition是position ，即你目前拥有(newposition is position) is True 。

只需定义newposition独立于position 。 例如：

# [...]
for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position    = np.array([0,19])
    newposition = np.empty((2,))
    # [...]

此外，您可能有充分的理由这样做，但请记住，如果move和position具有相同的形状并传达“相同的信息”，您也可以这样做

# [...]
    # [...]
        # [...]
        # newposition[0] = position[0] + move[0]
        # newposition[1] = position[1] + move[1]
        newposition = position + move
        # [...]

并删除newposition = np.empty((2,)) 。

Answer 2

那是因为您尝试使用=运算符将一个列表复制到另一个列表； 与列表一起使用时，它将存储在右变量中的指针分配给左变量，因此物理上指向相同的 memory 单元格。

要真正复制列表，请使用list.copy()方法。

循环中的变量更新错误 - Python（Q-learning）

问题描述

2 个解决方案

解决方案1
1 2021-05-18 18:45:32

解决方案2
0 已采纳 2021-05-18 18:49:03

循环中的变量更新错误 - Python（Q-learning）

问题描述

2 个解决方案

解决方案1 1 2021-05-18 18:45:32

解决方案2 0 已采纳 2021-05-18 18:49:03

解决方案1
1 2021-05-18 18:45:32

解决方案2
0 已采纳 2021-05-18 18:49:03