简体   繁体   English

从gym-minigrid获取state用于Q-learning

[英]Getting a state from gym-minigrid for Q-learning

I'm trying to create a Q-learner in the gym-minigrid environment, based on an implementation I found online.我正在尝试根据我在网上找到的实现gym-minigrid环境中创建一个Q-learner。 The implementation works just fine, but it uses the normal Open AI Gym environment, which has access to some variables that are not present, or not presented in the same way, as in the gym-minigrid library.该实现工作得很好,但它使用了普通的 Open AI Gym 环境,该环境可以访问一些不存在的变量,或者不像在 gym-minigrid 库中那样以相同的方式呈现的变量。 Where in for instance the "Taxi-v3" environment, I can get the current state with env.s and get the state space with env.observation_space.n , but neither of these are available in gym-minigrid.例如,在“Taxi-v3”环境中,我可以使用 env.s 获取当前的 state 并使用env.s获取env.observation_space.n空间,但这些在gym-minigrid中都不可用。

This is especially challenging to me, as I cannot simply do new_state, reward, done, info = env.step(action) and use that new_state to obtain a value in my Q-table.这对我来说尤其具有挑战性,因为我不能简单地执行new_state, reward, done, info = env.step(action)并使用该new_state在我的 Q 表中获取一个值。 Using for instance the "MiniGrid-Empty-8x8-v0" environment, doing a step with an action, and printing this next state, I get the following output:例如,使用“MiniGrid-Empty-8x8-v0”环境,执行一个操作步骤,并打印下一个 state,我得到以下 output:

{'image': array([[[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]]], dtype=uint8), 'direction': 0, 'mission': 'get to the green goal square'}

As you can see, this is not a single value for a state I can use and plug into my Q-table.如您所见,这不是 state 的单个值,我可以使用并插入我的 Q 表。 Is there any way to transform the above into a single value for a specific state, which I can then use to get the entry in my Q-table?有没有办法将上述内容转换为特定 state 的单个值,然后我可以使用它来获取 Q 表中的条目? Similarly, is there an easy, non-hardcoded way, which I can use to obtain the state space, similar to env.observation_space.n ?同样,是否有一种简单的非硬编码方式,我可以使用它来获取 state 空间,类似于env.observation_space.n

I had initially thought to make tuples out of the (position, direction) variables, making a new entry(a dict) with 6 positions for each action, as given by state_tup = ((tuple(env.agent_pos), env.agent_dir)) , and use those as keys in a dict.我最初想用(位置,方向)变量制作元组,为每个动作创建一个新条目(一个字典),每个动作有 6 个位置,由state_tup = ((tuple(env.agent_pos), env.agent_dir)) ,并将它们用作字典中的键。 With that I could build a Q-table to let my agent learn on the environment.有了它,我可以构建一个 Q-table 让我的代理在环境中学习。 The only downside here, is that this gets more tricky for other environments that are not the Empty Environment, let's say the "MiniGrid-DoorKey-8x8-v0" environment, where we have randomly placed wall, key, and door.这里唯一的缺点是,这对于不是空环境的其他环境变得更加棘手,比如“MiniGrid-DoorKey-8x8-v0”环境,我们在其中随机放置了墙、钥匙和门。 How would I approach getting the state space in that scenario, to make my Q-table?在那种情况下,我将如何获得 state 空间来制作我的 Q 表?

You can use ImgObsWrapper that gets rid of the 'mission' field in observations, leaving only the image field tensor:您可以使用 ImgObsWrapper 摆脱观察中的“任务”字段,只留下图像字段张量:

from gym_minigrid.wrappers import *
env = gym.make('MiniGrid-Empty-8x8-v0')
env = ImgObsWrapper(env) 

with this new env you can simply run:使用这个新环境,您可以简单地运行:

obs, reward, done, info = env.step(action)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM