OpenAI Gym Atari游戏，TD策略应用

Question

Can I apply TD policy to such environments? 我可以将TD政策应用于此类环境吗？ Or only methods like DQN and why? 还是只有DQN这样的方法，为什么？

I try to apply TD policy evaluation to Gym's Atari games' simulations in Python and I am a little new to it. 我尝试将TD策略评估应用到Gym的Atari游戏的Python模拟中，但我对此有些陌生。 I have this Value class: 我有这个Value类：

class V_Class():  
""" Class to store the state Value function
    V(s) = expected future discounted reward from s onwards (the return Gt)
    Stores it as a dictionnary and adds states as encounteded (get method)
    Two methods: get and set
"""
def __init__(self):       
    self.f = {}

def get(self, s):        
    if(s not in self.f):
        self.f[s] = 0             
    return self.f[s]

def set(self, s, y):
    self.f[s] = y

and I have this implementation: 我有这个实现：

env = Environment.Environment("SpaceInvaders-v0")
V = V_Class()

iepisode = 0
while iepisode <= 1:
    obs = env.reset()
    done = False
    SUMREWARD=0
    while not done:
        print("obs:", obs)
        action = env.action_space.sample()
        new_obs, reward, done, info = env.step(action)
        SUMREWARD+=reward
        new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs)) 
        V.set(obs,new_Vs)
        obs = new_obs

but I get this error: 但是我得到这个错误：

TypeError: unhashable type: 'numpy.ndarray'  as you can see here:

>  TypeError             Traceback (most recent call
> last) <ipython-input-12-428939358367> in <module>
>      12         new_obs, reward, done, info = env.step(action)
>      13         SUMREWARD+=reward
> ---> 14         new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))
>      15         V.set(obs,new_Vs)
>      16         obs = new_obs
> 
> <ipython-input-4-5d3d077cd162> in get(self, s)
>       9 
>      10     def get(self, s):
> ---> 11         if(s not in self.f):
>      12             self.f[s] = 0
>      13         return self.f[s]
> 
> TypeError: unhashable type: 'numpy.ndarray'

Because this V class was initially made for classical environments where the states (obs variable) are single numbers, while Atari environments have big, 3 - dimensional numpy.ndarray , representing the states. 因为此V类最初是针对状态（obs变量）为单个数字的经典环境制作的，而Atari环境具有表示状态的3维大numpy.ndarray 。

V class here should check in its dictionary f , if this state has an already stored value, and if not to store a value for it based on the formula: 这里的V类应该检查其字典f，如果该状态具有已存储的值，并且是否不基于公式为其存储值：

new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))

How would you suggest me to fix this? 您如何建议我解决此问题？ Is there a process which I don't know an I should follow for such cases or I just have to update my V class methods to store big states as a dictionary key? 对于这种情况，是否存在我不知道应遵循的过程，还是只需要更新V类方法以将大状态存储为字典键？

Answer 1

You are trying to search in a dictionary( f ) with a numpy array ( obs ) as the key like in this example: 您正尝试在一个以numpy数组（ obs ）作为键的字典（ f ）中进行搜索，如本例所示：

import numpy as np
array = np.ndarray([1,2,3])
dict = {}
if array not in dict:
    print("Its not")
else:
    print("Its in")

that returns the same error: 返回相同的错误：

TypeError: unhashable type: 'numpy.ndarray'

You have to use as a key a hashable type, not an array. 您必须将可哈希类型而不是数组用作键。

OpenAI Gym Atari游戏，TD策略应用

问题描述

1 个解决方案

解决方案1
0 2019-03-13 11:04:15

OpenAI Gym Atari游戏，TD策略应用

问题描述

1 个解决方案

解决方案1 0 2019-03-13 11:04:15

解决方案1
0 2019-03-13 11:04:15