简体   繁体   English

PPOAgent + Cartpole = ValueError:actor_network 输出规范与动作规范不匹配:

[英]PPOAgent + Cartpole = ValueError: actor_network output spec does not match action spec:

I'm trying to experiment with using tf_agents' PPOAgent in the CartPole-v1 environment, but I am recieving the following error upon declaring the agent itself:我正在尝试在 CartPole-v1 环境中使用 tf_agents 的 PPOAgent 进行试验,但在声明代理本身时收到以下错误:

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(2,), dtype=tf.float32, name=None)
vs.
BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(1, dtype=int64))

I believe the issue is that the output of my network is tf.float32 not tf.int64 , but I could be wrong.我相信问题是我的网络的输出是tf.float32而不是tf.int64 ,但我可能是错的。 I don't know how to make the network output an integer though, and as I understand it that's just not possible or desired.我不知道如何使网络输出一个整数,据我所知,这是不可能或不希望的。

If I run a continuous environment like MountainCarContinuous-v0 I get a different error:如果我运行像 MountainCarContinuous-v0 这样的连续环境,我会得到一个不同的错误:

ValueError: Unexpected output from `actor_network`.  Expected `Distribution` objects, but saw output spec: TensorSpec(shape=(1,), dtype=tf.float32, name=None)

Here's the relevant code (mostly taken from the DQN tutorial):这是相关代码(主要取自 DQN 教程):

# env_name = 'MountainCarContinuous-v0'
env_name = 'CartPole-v1'
train_py_env = suite_gym.load(env_name)
eval_py_env = suite_gym.load(env_name)

train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)

train_env.reset()
eval_env.reset()

actor_layer_params = (100, 50)
critic_layer_params = (100, 50)
action_tensor_spec = tensor_spec.from_spec(train_env.action_spec())
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1

# Define a helper function to create Dense layers configured with the right
# activation and kernel initializer.
def dense_layer(num_units):
  return tf.keras.layers.Dense(
      num_units,
      activation=tf.keras.activations.relu,
      kernel_initializer=tf.keras.initializers.VarianceScaling(
          scale=2.0, mode='fan_in', distribution='truncated_normal'))

#Actor network
dense_layers = [dense_layer(num_units) for num_units in actor_layer_params]
actions_layer = tf.keras.layers.Dense(
    1,
    name='actions',
    activation=None,
    kernel_initializer=tf.keras.initializers.RandomUniform(
        minval=-0.03, maxval=0.03),
    bias_initializer=tf.keras.initializers.Constant(-0.2))

ActorNet = sequential.Sequential(dense_layers + [actions_layer])

#Critic/value network
dense_layers = [dense_layer(num_units) for num_units in critic_layer_params]
criticism_layer = tf.keras.layers.Dense(
    1,
    activation=None,
    kernel_initializer=tf.keras.initializers.RandomUniform(
        minval=-0.03, maxval=0.03),
    bias_initializer=tf.keras.initializers.Constant(-0.2))
CriticNet = sequential.Sequential(dense_layers + [criticism_layer])

optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

train_step_counter = tf.Variable(0)


#Error occurs here
agent = tf_agents.agents.PPOAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    optimizer=optimizer,
    actor_net=ActorNet,
    value_net=CriticNet,
    train_step_counter=train_step_counter)

I feel like I must be missing something obvious, or have a fundamental misunderstanding, any and all help would be appreciated.我觉得我一定遗漏了一些明显的东西,或者有一个根本的误解,任何和所有的帮助都将不胜感激。 I couldn't find an example of a PPOAgent in use.我找不到正在使用的 PPOAgent 的示例。

Figured it out, I needed to use a network which returns a distribution such as an ActorDistributionNetwork想通了,我需要使用一个返回分布的网络,例如 ActorDistributionNetwork

Details here: https://www.tensorflow.org/agents/api_docs/python/tf_agents/networks/actor_distribution_network/ActorDistributionNetwork详情请见: https : //www.tensorflow.org/agents/api_docs/python/tf_agents/networks/actor_distribution_network/ActorDistributionNetwork

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 张量流中的classifier.predict错误(shape_and_slice规范中的形状不匹配) - Error for classifier.predict in tensorflow (Shape in shape_and_slice spec does not match) 通过pip安装本地车轮时出现ValueError“预期版本规格” - ValueError “Expected version spec” when installing local wheel via pip Python3:为什么__spec__有效? - Python3: Why does __spec__ work? 什么是spec和spec_set - What is spec and spec_set Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状 - Shape of _observation_spec and shape of _action_spec in the Tf-agents environments example 引发ValueError(“ Expected” + item_name +” in”,line,“ at”,line [p:])ValueError:('Expected version spec in','django〜= 1.9.0','at','〜= 1.9.0') - raise ValueError(“Expected ”+item_name+“ in”,line,“at”,line[p:]) ValueError: ('Expected version spec in', 'django~=1.9.0', 'at', '~=1.9.0') 神经网络 output 在交叉熵方法中尝试解决 CartPole-v0 的问题 - Problem with output of neural network in a cross-entropy method attempt at solving CartPole-v0 Python:如何在Pyinstaller .spec文件中指定输出文件夹 - Python: how to specify output folders in Pyinstaller .spec file Python 3.7+ 规范是否保证 `collections.OrderedDict 不是 dict`? - Does the Python 3.7+ spec guarantee `collections.OrderedDict is not dict`? 点击错误“ValueError:('缺少分发规范','==')”在ubuntu服务器上安装pandas时 - pip error “ValueError: ('Missing distribution spec', '==')” when installing pandas on ubuntu server
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM