简体   繁体   English

Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状

[英]Shape of _observation_spec and shape of _action_spec in the Tf-agents environments example

In the tensorflow documentation for TF-Agents Environments there is an example of an environment for a simple (blackjack inspired) card game.TF-Agents Environmentstensorflow 文档中,有一个简单(受二十一点启发)纸牌游戏的环境示例。

The init looks like the following: init如下所示:

class CardGameEnv(py_environment.PyEnvironment):

  def __init__(self):
    self._action_spec = array_spec.BoundedArraySpec(
        shape=(), dtype=np.int32, minimum=0, maximum=1, name='action')
    self._observation_spec = array_spec.BoundedArraySpec(
        shape=(1,), dtype=np.int32, minimum=0, name='observation')
    self._state = 0
    self._episode_ended = False

The action spec allows only for 0 (do not ask for a card) or 1 (ask for a card), and so it's sensible that the shape is shape=() (just needs an integer).动作规范只允许 0(不要求卡片)或 1(要求卡片),因此形状是shape=() (只需要一个整数)是明智的。

However I don't quite understand the observation spec shape being shape=(1,) , given that it will just represent the sum of the cards in the current round (so also an integer).但是,我不太明白观察规范形状是shape=(1,) ,因为它仅表示当前回合中卡片的总和(因此也是整数)。

What explains the difference in shapes?什么解释了形状的差异?

At start I thought they were the same.一开始我以为他们是一样的。 To test them, I've run the following code on the W3 Schools Python "Try Editor" ( I accessed it through this link ):为了测试它们,我在 W3 Schools Python“Try Editor”( 我通过此链接访问它)上运行了以下代码:

import numpy as np

arr1 = np.zeros((), dtype=np.int32)
arr2 = np.zeros((1), dtype=np.int32)

print("This is the first array:", arr1, "\n")
print("This is the second array:", arr2, "\n")

The output I got was:我得到的输出是:

This is the first array: 0

This is the second array: [0] 

This leads me to conclude that the shape=() is a simple integer, treated as a 0-D array, but shape=(1,) is an 1-D array that consists of a single integer.这使我得出结论, shape=()是一个简单的整数,被视为一个 0 维数组,但shape=(1,)是一个由单个整数组成的一维数组。 I hope this is accurate, as I'd like some confirmation myself.我希望这是准确的,因为我自己也需要一些确认。 On a second test to check this further:在第二个测试中进一步检查:

import numpy as np

arr1 = np.array(42)
arr2 = np.array([1])
arr3 = np.array([1, 2, 3, 4])

print(arr1.shape)
print(arr2.shape)
print(arr3.shape)

The output was:输出是:

()
(1,)
(4,)

This seems to corroborate with what I concluded first, as arr1 is a 0-D array and arr3 is a 1-D array of 4 elements ( as explained in the W3 Schools tutorial ), and the array arr2 has a similar shape to arr3, but with a different number of elements.这似乎证实了我首先得出的结论,因为 arr1 是一个 0-D 数组,而 arr3 是一个包含 4 个元素的一维数组( 如 W3 Schools 教程中所述),并且数组 arr2 具有与 arr3 相似的形状,但具有不同数量的元素。

As for why the action and observation are represented respectively as integer and array of one element, it is probably because TensorFlow works using tensors (arrays of n-dimensions), and calculations might be easier considering the observation as an array.至于为什么将动作和观察分别表示为整数和一个元素的数组,可能是因为 TensorFlow 使用张量(n 维数组)工作,并且将观察视为数组可能更容易计算。

The action is declared as an integer probably to ease the process flow inside the _step() function, as it would be a little more tedious to work with an array for the if/elif/else structure.动作被声明为一个整数,可能是为了简化_step()函数内部的流程,因为使用 if/elif/else 结构的数组会有点乏味。 There areother examples of action_specs with more elements and discrete/continuous values, so nothing else comes to mind.还有其他具有更多元素和离散/连续值的 action_specs示例,因此没有其他想法。

I am not really sure all of this is right, but seems a good point to at least start discussing.我不确定所有这些是否正确,但至少开始讨论似乎是个好点子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM