[英]Observation with different boundaries. The observation returned by the `reset()` method does not match the given observation space
我是強化學習的初學者,所以不要苛刻地評判我。
error: AssertionError: reset()
方法返回的觀察與給定的觀察空間不匹配
觀察空間:
self.observation_space = gym.spaces.Tuple((
gym.spaces.Box(low=-float('inf'), high=self.fp.HEIGHT, shape=(1,), dtype=np.float64), # player y
gym.spaces.Box(low=0, high=self.fp.WIDTH + self.fp.MIN_PIPE_GAP + self.fp.PIPE_WIDTH, shape=(2,), dtype=np.float64), # pipes x
gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(1,), dtype=np.float64), # gravity
gym.spaces.Box(low=-(self.fp.HEIGHT / 4 * 3 + self.fp.MIN_PIPE_GAP + 100), high=self.fp.HEIGHT / 4 * 3 + self.fp.MIN_PIPE_GAP + 100, shape=(4,), dtype=np.float64), # pipes y
gym.spaces.Box(low=self.fp.PX, high=self.fp.PX, shape=(1,), dtype=np.float64) # player x
))
返回觀察:
return (
np.array([float(self.py)]), # py
np.array([float(self.pipes[ind]['x']), float(self.pipes[ind + 1]['x'])]), # x1 x2
np.array([float(self.gravity)]), # gravity
np.array([float(self.pipes[ind]['y1']), float(self.pipes[ind]['y2']), float(self.pipes[ind + 1]['y1']), float(self.pipes[ind + 1]['y2'])]), # y1 y2 y3 y4
np.array([float(self.PX)]) # px
)
我試圖將所有內容都放在一個數組中(它起作用了),但這是錯誤的,因為不同的數據組需要不同的邊界。 最有可能的是,錯誤的格式不對,如果根據您的說法,一切都是正確的,那么我會嘗試在邊框中找到錯誤
結果證明錯誤在邊界內。 但最后,checker建議使用Dict,所以我就這樣重寫了代碼:
觀察空間:
self.observation_space = gym.spaces.Dict({
"player_y": gym.spaces.Box(low=-float('inf'), high=self.fp.HEIGHT, shape=(1,), dtype=np.float64), # player y
"pipes_x": gym.spaces.Box(low=0, high=self.fp.WIDTH * 3, shape=(2,), dtype=np.float64), # pipes x
"gravity": gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(1,), dtype=np.float64), # gravity
"pipes_y": gym.spaces.Box(low=-(self.fp.HEIGHT / 4 * 3 + self.fp.MIN_PIPE_GAP + 100), high=self.fp.HEIGHT / 4 * 3 + self.fp.MIN_PIPE_GAP + 100, shape=(4,), dtype=np.float64), # pipes y
"player_x": gym.spaces.Box(low=self.fp.PX, high=self.fp.PX, shape=(1,), dtype=np.float64) # player x
})
返回:
return {
"player_y": np.array([float(self.py)]), # py
"pipes_x": np.array([float(self.pipes[ind]['x']), float(self.pipes[ind + 1]['x'])]), # x1 x2
"gravity": np.array([float(self.gravity)]), # gravity
"pipes_y": np.array([float(self.pipes[ind]['y1']), float(self.pipes[ind]['y2']), float(self.pipes[ind + 1]['y1']), float(self.pipes[ind + 1]['y2'])]), # y1 y2 y3 y4
"player_x": np.array([float(self.PX)]) # px
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.