简体   繁体   中英

Shape of _observation_spec and shape of _action_spec in the Tf-agents environments example

In the tensorflow documentation for TF-Agents Environments there is an example of an environment for a simple (blackjack inspired) card game.

The init looks like the following:

class CardGameEnv(py_environment.PyEnvironment):

  def __init__(self):
    self._action_spec = array_spec.BoundedArraySpec(
        shape=(), dtype=np.int32, minimum=0, maximum=1, name='action')
    self._observation_spec = array_spec.BoundedArraySpec(
        shape=(1,), dtype=np.int32, minimum=0, name='observation')
    self._state = 0
    self._episode_ended = False

The action spec allows only for 0 (do not ask for a card) or 1 (ask for a card), and so it's sensible that the shape is shape=() (just needs an integer).

However I don't quite understand the observation spec shape being shape=(1,) , given that it will just represent the sum of the cards in the current round (so also an integer).

What explains the difference in shapes?

At start I thought they were the same. To test them, I've run the following code on the W3 Schools Python "Try Editor" ( I accessed it through this link ):

import numpy as np

arr1 = np.zeros((), dtype=np.int32)
arr2 = np.zeros((1), dtype=np.int32)

print("This is the first array:", arr1, "\n")
print("This is the second array:", arr2, "\n")

The output I got was:

This is the first array: 0

This is the second array: [0] 

This leads me to conclude that the shape=() is a simple integer, treated as a 0-D array, but shape=(1,) is an 1-D array that consists of a single integer. I hope this is accurate, as I'd like some confirmation myself. On a second test to check this further:

import numpy as np

arr1 = np.array(42)
arr2 = np.array([1])
arr3 = np.array([1, 2, 3, 4])

print(arr1.shape)
print(arr2.shape)
print(arr3.shape)

The output was:

()
(1,)
(4,)

This seems to corroborate with what I concluded first, as arr1 is a 0-D array and arr3 is a 1-D array of 4 elements ( as explained in the W3 Schools tutorial ), and the array arr2 has a similar shape to arr3, but with a different number of elements.

As for why the action and observation are represented respectively as integer and array of one element, it is probably because TensorFlow works using tensors (arrays of n-dimensions), and calculations might be easier considering the observation as an array.

The action is declared as an integer probably to ease the process flow inside the _step() function, as it would be a little more tedious to work with an array for the if/elif/else structure. There areother examples of action_specs with more elements and discrete/continuous values, so nothing else comes to mind.

I am not really sure all of this is right, but seems a good point to at least start discussing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM