简体   繁体   中英

Getting a state from gym-minigrid for Q-learning

I'm trying to create a Q-learner in the gym-minigrid environment, based on an implementation I found online. The implementation works just fine, but it uses the normal Open AI Gym environment, which has access to some variables that are not present, or not presented in the same way, as in the gym-minigrid library. Where in for instance the "Taxi-v3" environment, I can get the current state with env.s and get the state space with env.observation_space.n , but neither of these are available in gym-minigrid.

This is especially challenging to me, as I cannot simply do new_state, reward, done, info = env.step(action) and use that new_state to obtain a value in my Q-table. Using for instance the "MiniGrid-Empty-8x8-v0" environment, doing a step with an action, and printing this next state, I get the following output:

{'image': array([[[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]]], dtype=uint8), 'direction': 0, 'mission': 'get to the green goal square'}

As you can see, this is not a single value for a state I can use and plug into my Q-table. Is there any way to transform the above into a single value for a specific state, which I can then use to get the entry in my Q-table? Similarly, is there an easy, non-hardcoded way, which I can use to obtain the state space, similar to env.observation_space.n ?

I had initially thought to make tuples out of the (position, direction) variables, making a new entry(a dict) with 6 positions for each action, as given by state_tup = ((tuple(env.agent_pos), env.agent_dir)) , and use those as keys in a dict. With that I could build a Q-table to let my agent learn on the environment. The only downside here, is that this gets more tricky for other environments that are not the Empty Environment, let's say the "MiniGrid-DoorKey-8x8-v0" environment, where we have randomly placed wall, key, and door. How would I approach getting the state space in that scenario, to make my Q-table?

You can use ImgObsWrapper that gets rid of the 'mission' field in observations, leaving only the image field tensor:

from gym_minigrid.wrappers import *
env = gym.make('MiniGrid-Empty-8x8-v0')
env = ImgObsWrapper(env) 

with this new env you can simply run:

obs, reward, done, info = env.step(action)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM