简体   繁体   中英

Why use Env class from OpenAI Gym as opposed to nothing when creating a custom environment?

This is a general question on the advantages of using gym.Env as superclass (as opposed to nothing):

I am thinking of building my own reinforcement learning environment for a small experiment. I have read a couple of blog posts on how to build one with the Env class from the OpenAI Gym package (for example https://medium.com/@apoddar573/making-your-own-custom-environment-in-gym-c3b65ff8cdaa ). But it seems like I can create an environmnet without needing to use the class at all. Eg if I wanted to create an env called Foo, the tutorials recommend I use something like

class FooEnv(gym.Env)

But I can just as well use

class FooEnv()

and my environmnent will still work in exactly the same way. I have seen one small benefit of using OpenAI Gym: I can initiate different versions of the environment in a cleaner way. But apart from that, can anyone describe or point out any resources on what big advantages the gym.Env superclass provides? I want to make sure I'm making full use of them:) thanks!

I think it's more of a way to make sure that the community will produce standardized environments, where interactions with the environment object always take place in the same manner.

It would be annoying if every developer used a different terminology for the same underlying concepts. For example, if no one creates a standard to comply with, you could have people defining the step method very differently ( make_step , forward_step , take_action , env_step , etc.), or even giving it a different signature by modifying its parameters for example.

When you inherit from gym.Env , you make sure that you'll implement needed methods, otherwise you'll get a NotImplementedError , see the source file :

    def step(self, action):
    """Run one timestep of the environment's dynamics. When end of
    episode is reached, you are responsible for calling `reset()`
    to reset this environment's state.

    Accepts an action and returns a tuple (observation, reward, done, info).

    Args:
        action (object): an action provided by the agent

    Returns:
        observation (object): agent's observation of the current environment
        reward (float) : amount of reward returned after previous action
        done (bool): whether the episode has ended, in which case further step() calls will return undefined results
        info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
    """
    raise NotImplementedError

This way, Reinforcement Learning agents are more easily reused and connected to different environments without needing to modify the code.

But overall you're right, you have little "direct" benefits from this parent class, except for the fact that your code will be more reusable, which is important !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM