简体   繁体   English

如何在批量学习中为 tf-agents 定义正确的形状

[英]How to define correct shape for tf-agents in batch learning

I am trying to train a DDPG agent with batch learning using the tf_agents library.我正在尝试使用 tf_agents 库来训练具有批量学习的 DDPG 代理。 However I require to define an observation_spec and action_spec which state the shapes of the tensors the agent would receive.但是,我需要定义一个观察规范和动作规范,其中 state 代理将接收的张量的形状。 I have managed to create the trajectories with which I can feed the data however these trajectories and the agent itself have mismatched shapes我已经设法创建了可以提供数据的轨迹,但是这些轨迹和代理本身的形状不匹配

I have tried changing the observation and action specs with the agent definition.我尝试使用代理定义更改观察和操作规范。 This is my agent defintion:这是我的代理定义:

observation_spec = TensorSpec(shape = (1,),dtype =  tf.float32)
time_step_spec = time_step.time_step_spec(observation_spec)
action_spec = BoundedTensorSpec([1],tf.float32,minimum = -100, maximum = 100)
actor_network = ActorNetwork(
        input_tensor_spec=observation_spec,
        output_tensor_spec=action_spec,
        fc_layer_params=(100,200,100),
        name="ddpg_ActorNetwork"
    )
critic_net_input_specs = (observation_spec, action_spec)
critic_network = CriticNetwork(
    input_tensor_spec=critic_net_input_specs,
    observation_fc_layer_params=(200,100),
    joint_fc_layer_params=(100,200),
    action_fc_layer_params=None,
    name="ddpg_CriticNetwork"
)



agent = ddpg_agent.DdpgAgent(
    time_step_spec=time_step_spec,
    action_spec=action_spec,
    actor_network=actor_network,
    critic_network=critic_network,
    actor_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    critic_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)

This is what the trajectory looks like这就是轨迹的样子


Trajectory(step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[0, 1]], dtype=int32)>, observation=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[280, 280]], dtype=int32)>, action=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[nan,  0.]])>, policy_info=(), next_step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[1, 1]], dtype=int32)>, reward=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[ -6.93147181, -12.14113521]])>, discount=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[0.9, 0.9]], dtype=float32)>)

i should be able to call agent.train(trajectory) and it would work however I get the following error:我应该能够调用 agent.train(trajectory) 并且它会工作但是我收到以下错误:


ValueError                                Traceback (most recent call last)
<ipython-input-325-bf162a5dc8d7> in <module>
----> 1 agent.train(trajs[0])

~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in train(self, experience, weights)
    213           "experience must be type Trajectory, saw type: %s" % type(experience))
    214 
--> 215     self._check_trajectory_dimensions(experience)
    216 
    217     if self._enable_functions:

~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in _check_trajectory_dimensions(self, experience)
    137     if not nest_utils.is_batched_nested_tensors(
    138         experience, self.collect_data_spec,
--> 139         num_outer_dims=self._num_outer_dims):
    140       debug_str_1 = tf.nest.map_structure(lambda tp: tp.shape, experience)
    141       debug_str_2 = tf.nest.map_structure(lambda spec: spec.shape,

~/.local/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py in is_batched_nested_tensors(tensors, specs, num_outer_dims)
    142       'And spec_shapes:\n   %s' %
    143       (num_outer_dims, tf.nest.pack_sequence_as(tensors, tensor_shapes),
--> 144        tf.nest.pack_sequence_as(specs, spec_shapes)))
    145 
    146 

ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs.  num_outer_dims: 2.
Saw tensor_shapes:
   Trajectory(step_type=TensorShape([1, 2]), observation=TensorShape([1, 2]), action=TensorShape([1, 2]), policy_info=(), next_step_type=TensorShape([1, 2]), reward=TensorShape([1, 2]), discount=TensorShape([1, 2]))
And spec_shapes:
   Trajectory(step_type=TensorShape([]), observation=TensorShape([1]), action=TensorShape([1]), policy_info=(), next_step_type=TensorShape([]), reward=TensorShape([]), discount=TensorShape([]))

This can be easily solved by using the environment.这可以通过使用环境轻松解决。 In TF-Agents the environment needs to follow the PyEnvironment class (and then you wrap this with a TFPyEnvironment for parallel execution of multiple envs).在 TF-Agents 中,环境需要遵循PyEnvironment class(然后用TFPyEnvironment包装它以并行执行多个 env)。 If you have already defined your environment to match this class' specification then your environment should already provide you with the two methods env.time_step_spec() and env.action_spec() .如果你已经定义了你的环境来匹配这个类的规范,那么你的环境应该已经为你提供了两个方法env.time_step_spec()env.action_spec() Simply feed these two to your agent and you should be done.只需将这两个提供给您的代理,您就应该完成。

It becomes a bit more complicated if you want to have multiple outputs from your environment that don't all go into your agent.如果您想从您的环境中获得多个输出,而不是所有 go 进入您的代理,它会变得有点复杂。 In this case you will need to define an observation_and_action_constraint_splitter fuction to pass to your agent.在这种情况下,您需要定义一个observation_and_action_constraint_splitter函数以传递给您的代理。 For more details as to how to get your TensorSpecs/ArraySpecs right, and for an example of something that works see my answer here有关如何正确设置 TensorSpecs/ArraySpecs 的更多详细信息,以及可行的示例,请参见我的回答here

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 TF-agents - 重播缓冲区将轨迹添加到批次形状不匹配 - TF-agents - Replay buffer add trajectory to batch shape mismatch 如何在 anaconda 中安装 tf-agents - How to install tf-agents in anaconda 如何在 Tensor Flow 中调整 tf-agents 和策略的超参数? - How to tune hyperparameters of tf-agents and policies in Tensor Flow? 如何获得 tf-agents 中所有动作的概率向量? - How to get probability vector for all actions in tf-agents? 如何在 Tf-agents 中传递自定义环境的批量大小 - How to pass the batchsize for a custom environment in Tf-agents Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状 - Shape of _observation_spec and shape of _action_spec in the Tf-agents environments example 使用 PyEnvironment TF-Agents 包装自定义健身房 - Wrapping custom gym with PyEnvironment TF-Agents Tf-Agents ParallelPyEnvironment静默失败 - Tf-Agents ParallelPyEnvironment fails silently 如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError? - How to fix a TypeError between policy_state and policy_state_spec in TF-Agents? 如何在 python 的大查询中存储 tf-agents 的轨迹对象并将其作为轨迹对象检索回来 - How to store tf-agents' trajectory object in big query from python and retrieve it back as the trajectory object
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM