简体   繁体   English

OpenAI Gym Atari游戏,TD策略应用

[英]OpenAI Gym Atari games, TD Policy application

Can I apply TD policy to such environments? 我可以将TD政策应用于此类环境吗? Or only methods like DQN and why? 还是只有DQN这样的方法,为什么?

I try to apply TD policy evaluation to Gym's Atari games' simulations in Python and I am a little new to it. 我尝试将TD策略评估应用到Gym的Atari游戏的Python模拟中,但我对此有些陌生。 I have this Value class: 我有这个Value类:

class V_Class():  
""" Class to store the state Value function
    V(s) = expected future discounted reward from s onwards (the return Gt)
    Stores it as a dictionnary and adds states as encounteded (get method)
    Two methods: get and set
"""
def __init__(self):       
    self.f = {}

def get(self, s):        
    if(s not in self.f):
        self.f[s] = 0             
    return self.f[s]

def set(self, s, y):
    self.f[s] = y  

and I have this implementation: 我有这个实现:

env = Environment.Environment("SpaceInvaders-v0")
V = V_Class()

iepisode = 0
while iepisode <= 1:
    obs = env.reset()
    done = False
    SUMREWARD=0
    while not done:
        print("obs:", obs)
        action = env.action_space.sample()
        new_obs, reward, done, info = env.step(action)
        SUMREWARD+=reward
        new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs)) 
        V.set(obs,new_Vs)
        obs = new_obs

but I get this error: 但是我得到这个错误:

TypeError: unhashable type: 'numpy.ndarray'  as you can see here:

>  TypeError             Traceback (most recent call
> last) <ipython-input-12-428939358367> in <module>
>      12         new_obs, reward, done, info = env.step(action)
>      13         SUMREWARD+=reward
> ---> 14         new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))
>      15         V.set(obs,new_Vs)
>      16         obs = new_obs
> 
> <ipython-input-4-5d3d077cd162> in get(self, s)
>       9 
>      10     def get(self, s):
> ---> 11         if(s not in self.f):
>      12             self.f[s] = 0
>      13         return self.f[s]
> 
> TypeError: unhashable type: 'numpy.ndarray'

Because this V class was initially made for classical environments where the states (obs variable) are single numbers, while Atari environments have big, 3 - dimensional numpy.ndarray , representing the states. 因为此V类最初是针对状态(obs变量)为单个数字的经典环境制作的,而Atari环境具有表示状态的3维大numpy.ndarray

V class here should check in its dictionary f , if this state has an already stored value, and if not to store a value for it based on the formula: 这里的V类应该检查其字典f,如果该状态具有已存储的值,并且是否不基于公式为其存储值:

new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))

How would you suggest me to fix this? 您如何建议我解决此问题? Is there a process which I don't know an I should follow for such cases or I just have to update my V class methods to store big states as a dictionary key? 对于这种情况,是否存在我不知道应遵循的过程,还是只需要更新V类方法以将大状态存储为字典键?

You are trying to search in a dictionary( f ) with a numpy array ( obs ) as the key like in this example: 您正尝试在一个以numpy数组( obs )作为键的字典( f )中进行搜索,如本例所示:

import numpy as np
array = np.ndarray([1,2,3])
dict = {}
if array not in dict:
    print("Its not")
else:
    print("Its in")

that returns the same error: 返回相同的错误:

TypeError: unhashable type: 'numpy.ndarray'

You have to use as a key a hashable type, not an array. 您必须将可哈希类型而不是数组用作键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM