[英]OpenAI Gym Atari games, TD Policy application
Can I apply TD policy to such environments? 我可以将TD政策应用于此类环境吗? Or only methods like DQN and why?
还是只有DQN这样的方法,为什么?
I try to apply TD policy evaluation to Gym's Atari games' simulations in Python and I am a little new to it. 我尝试将TD策略评估应用到Gym的Atari游戏的Python模拟中,但我对此有些陌生。 I have this Value class:
我有这个Value类:
class V_Class():
""" Class to store the state Value function
V(s) = expected future discounted reward from s onwards (the return Gt)
Stores it as a dictionnary and adds states as encounteded (get method)
Two methods: get and set
"""
def __init__(self):
self.f = {}
def get(self, s):
if(s not in self.f):
self.f[s] = 0
return self.f[s]
def set(self, s, y):
self.f[s] = y
and I have this implementation: 我有这个实现:
env = Environment.Environment("SpaceInvaders-v0")
V = V_Class()
iepisode = 0
while iepisode <= 1:
obs = env.reset()
done = False
SUMREWARD=0
while not done:
print("obs:", obs)
action = env.action_space.sample()
new_obs, reward, done, info = env.step(action)
SUMREWARD+=reward
new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))
V.set(obs,new_Vs)
obs = new_obs
but I get this error: 但是我得到这个错误:
TypeError: unhashable type: 'numpy.ndarray' as you can see here:
> TypeError Traceback (most recent call
> last) <ipython-input-12-428939358367> in <module>
> 12 new_obs, reward, done, info = env.step(action)
> 13 SUMREWARD+=reward
> ---> 14 new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))
> 15 V.set(obs,new_Vs)
> 16 obs = new_obs
>
> <ipython-input-4-5d3d077cd162> in get(self, s)
> 9
> 10 def get(self, s):
> ---> 11 if(s not in self.f):
> 12 self.f[s] = 0
> 13 return self.f[s]
>
> TypeError: unhashable type: 'numpy.ndarray'
Because this V class was initially made for classical environments where the states (obs variable) are single numbers, while Atari environments have big, 3 - dimensional numpy.ndarray
, representing the states. 因为此V类最初是针对状态(obs变量)为单个数字的经典环境制作的,而Atari环境具有表示状态的3维大
numpy.ndarray
。
V class here should check in its dictionary f , if this state has an already stored value, and if not to store a value for it based on the formula: 这里的V类应该检查其字典f,如果该状态具有已存储的值,并且是否不基于公式为其存储值:
new_Vs = V.get(obs) + 0.7*(reward + 0.5*V.get(new_obs) - V.get(obs))
How would you suggest me to fix this? 您如何建议我解决此问题? Is there a process which I don't know an I should follow for such cases or I just have to update my V class methods to store big states as a dictionary key?
对于这种情况,是否存在我不知道应遵循的过程,还是只需要更新V类方法以将大状态存储为字典键?
You are trying to search in a dictionary( f
) with a numpy array ( obs
) as the key like in this example: 您正尝试在一个以numpy数组(
obs
)作为键的字典( f
)中进行搜索,如本例所示:
import numpy as np
array = np.ndarray([1,2,3])
dict = {}
if array not in dict:
print("Its not")
else:
print("Its in")
that returns the same error: 返回相同的错误:
TypeError: unhashable type: 'numpy.ndarray'
You have to use as a key a hashable type, not an array. 您必须将可哈希类型而不是数组用作键。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.