如何在批量学习中为 tf-agents 定义正确的形状

Question

I am trying to train a DDPG agent with batch learning using the tf_agents library.我正在尝试使用 tf_agents 库来训练具有批量学习的 DDPG 代理。 However I require to define an observation_spec and action_spec which state the shapes of the tensors the agent would receive.但是，我需要定义一个观察规范和动作规范，其中 state 代理将接收的张量的形状。 I have managed to create the trajectories with which I can feed the data however these trajectories and the agent itself have mismatched shapes我已经设法创建了可以提供数据的轨迹，但是这些轨迹和代理本身的形状不匹配

I have tried changing the observation and action specs with the agent definition.我尝试使用代理定义更改观察和操作规范。 This is my agent defintion:这是我的代理定义：

observation_spec = TensorSpec(shape = (1,),dtype =  tf.float32)
time_step_spec = time_step.time_step_spec(observation_spec)
action_spec = BoundedTensorSpec([1],tf.float32,minimum = -100, maximum = 100)
actor_network = ActorNetwork(
        input_tensor_spec=observation_spec,
        output_tensor_spec=action_spec,
        fc_layer_params=(100,200,100),
        name="ddpg_ActorNetwork"
    )
critic_net_input_specs = (observation_spec, action_spec)
critic_network = CriticNetwork(
    input_tensor_spec=critic_net_input_specs,
    observation_fc_layer_params=(200,100),
    joint_fc_layer_params=(100,200),
    action_fc_layer_params=None,
    name="ddpg_CriticNetwork"
)



agent = ddpg_agent.DdpgAgent(
    time_step_spec=time_step_spec,
    action_spec=action_spec,
    actor_network=actor_network,
    critic_network=critic_network,
    actor_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    critic_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)

This is what the trajectory looks like这就是轨迹的样子


Trajectory(step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[0, 1]], dtype=int32)>, observation=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[280, 280]], dtype=int32)>, action=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[nan,  0.]])>, policy_info=(), next_step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[1, 1]], dtype=int32)>, reward=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[ -6.93147181, -12.14113521]])>, discount=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[0.9, 0.9]], dtype=float32)>)

i should be able to call agent.train(trajectory) and it would work however I get the following error:我应该能够调用 agent.train(trajectory) 并且它会工作但是我收到以下错误：


ValueError                                Traceback (most recent call last)
<ipython-input-325-bf162a5dc8d7> in <module>
----> 1 agent.train(trajs[0])

~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in train(self, experience, weights)
    213           "experience must be type Trajectory, saw type: %s" % type(experience))
    214 
--> 215     self._check_trajectory_dimensions(experience)
    216 
    217     if self._enable_functions:

~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in _check_trajectory_dimensions(self, experience)
    137     if not nest_utils.is_batched_nested_tensors(
    138         experience, self.collect_data_spec,
--> 139         num_outer_dims=self._num_outer_dims):
    140       debug_str_1 = tf.nest.map_structure(lambda tp: tp.shape, experience)
    141       debug_str_2 = tf.nest.map_structure(lambda spec: spec.shape,

~/.local/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py in is_batched_nested_tensors(tensors, specs, num_outer_dims)
    142       'And spec_shapes:\n   %s' %
    143       (num_outer_dims, tf.nest.pack_sequence_as(tensors, tensor_shapes),
--> 144        tf.nest.pack_sequence_as(specs, spec_shapes)))
    145 
    146 

ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs.  num_outer_dims: 2.
Saw tensor_shapes:
   Trajectory(step_type=TensorShape([1, 2]), observation=TensorShape([1, 2]), action=TensorShape([1, 2]), policy_info=(), next_step_type=TensorShape([1, 2]), reward=TensorShape([1, 2]), discount=TensorShape([1, 2]))
And spec_shapes:
   Trajectory(step_type=TensorShape([]), observation=TensorShape([1]), action=TensorShape([1]), policy_info=(), next_step_type=TensorShape([]), reward=TensorShape([]), discount=TensorShape([]))

Answer 1

This can be easily solved by using the environment.这可以通过使用环境轻松解决。 In TF-Agents the environment needs to follow the PyEnvironment class (and then you wrap this with a TFPyEnvironment for parallel execution of multiple envs).在 TF-Agents 中，环境需要遵循PyEnvironment class（然后用TFPyEnvironment包装它以并行执行多个 env）。 If you have already defined your environment to match this class' specification then your environment should already provide you with the two methods env.time_step_spec() and env.action_spec() .如果你已经定义了你的环境来匹配这个类的规范，那么你的环境应该已经为你提供了两个方法env.time_step_spec()和env.action_spec() 。 Simply feed these two to your agent and you should be done.只需将这两个提供给您的代理，您就应该完成。

It becomes a bit more complicated if you want to have multiple outputs from your environment that don't all go into your agent.如果您想从您的环境中获得多个输出，而不是所有 go 进入您的代理，它会变得有点复杂。 In this case you will need to define an observation_and_action_constraint_splitter fuction to pass to your agent.在这种情况下，您需要定义一个observation_and_action_constraint_splitter函数以传递给您的代理。 For more details as to how to get your TensorSpecs/ArraySpecs right, and for an example of something that works see my answer here有关如何正确设置 TensorSpecs/ArraySpecs 的更多详细信息，以及可行的示例，请参见我的回答here

如何在批量学习中为 tf-agents 定义正确的形状

问题描述

1 个解决方案

解决方案1
0 2020-12-11 15:46:39

如何在批量学习中为 tf-agents 定义正确的形状

问题描述

1 个解决方案

解决方案1 0 2020-12-11 15:46:39

解决方案1
0 2020-12-11 15:46:39