如何在每回合接收 3 个缩放器和一个矩阵的自定义健身房环境中定义动作空间？

Question

for a personal project, I need to define a custom gym environment that runs a certain board game.对于个人项目，我需要定义一个运行特定棋盘游戏的自定义健身房环境。 each turn of the game, the environment takes the state of the board as a matrix of ones and zeros, and an action - described as a tuple:游戏的每一轮，环境都将棋盘的状态作为一个由 1 和 0 组成的矩阵，以及一个动作——描述为一个元组：

(integer, integer, small matrix) （整数，整数，小矩阵）

from reading online, I know that a gym env should take the shape:通过在线阅读，我知道健身房环境应该是这样的：

 class CustomEnv(gym.Env):
  """Custom Environment that follows gym interface"""
  metadata = {'render.modes': ['human']}

  def __init__(self, arg1, arg2, ...):
    super(CustomEnv, self).__init__()

    self.action_space = 
    self.observation_space = 

  def step(self, action):
    ...
  def reset(self):
    ...
  def render(self, mode='human', close=False):

now, I feel like the action input here does not exactly fall into "discrete" or "continuous" - how should I implement the action part of the init function and the step function?现在，我觉得这里的动作输入并不完全属于“离散”或“连续”——我应该如何实现 init 函数和 step 函数的动作部分？

Answer 1

Defining your action space in the init function is fairly straight forward using gym's Tuple space:使用gym的元组空间在init函数中定义你的动作空间是相当简单的：

from gym import spaces
space = spaces.Tuple((
  spaces.Discrete(5),
  spaces.Discrete(4),
  spaces.Box(low=0, high=1, shape=(2, 2))))

The Discrete space represents a range of integers and the Box space to represents a n-dimensional array. Discrete 空间表示整数范围，Box 空间表示 n 维数组。 You can print a sample of your space to get an idea of what it looks like:您可以打印您的空间样本以了解它的外观：

print(space.sample())
>>> (3, 1, array([[0.20318432, 0.26787955], [0.5323673 , 0.6564413 ]], dtype=float32))

For the step function, you just need to interact with your environment based on the input action, which will be formatted just like the sample.对于 step 函数，您只需要根据输入操作与您的环境进行交互，其格式将与示例一样。

如何在每回合接收 3 个缩放器和一个矩阵的自定义健身房环境中定义动作空间？

问题描述

1 个解决方案

解决方案1
1 2020-09-30 19:38:19

如何在每回合接收 3 个缩放器和一个矩阵的自定义健身房环境中定义动作空间？

问题描述

1 个解决方案

解决方案1 1 2020-09-30 19:38:19

解决方案1
1 2020-09-30 19:38:19