[英]How to define correct shape for tf-agents in batch learning
I am trying to train a DDPG agent with batch learning using the tf_agents library.我正在尝试使用 tf_agents 库来训练具有批量学习的 DDPG 代理。 However I require to define an observation_spec and action_spec which state the shapes of the tensors the agent would receive.
但是,我需要定义一个观察规范和动作规范,其中 state 代理将接收的张量的形状。 I have managed to create the trajectories with which I can feed the data however these trajectories and the agent itself have mismatched shapes
我已经设法创建了可以提供数据的轨迹,但是这些轨迹和代理本身的形状不匹配
I have tried changing the observation and action specs with the agent definition.我尝试使用代理定义更改观察和操作规范。 This is my agent defintion:
这是我的代理定义:
observation_spec = TensorSpec(shape = (1,),dtype = tf.float32)
time_step_spec = time_step.time_step_spec(observation_spec)
action_spec = BoundedTensorSpec([1],tf.float32,minimum = -100, maximum = 100)
actor_network = ActorNetwork(
input_tensor_spec=observation_spec,
output_tensor_spec=action_spec,
fc_layer_params=(100,200,100),
name="ddpg_ActorNetwork"
)
critic_net_input_specs = (observation_spec, action_spec)
critic_network = CriticNetwork(
input_tensor_spec=critic_net_input_specs,
observation_fc_layer_params=(200,100),
joint_fc_layer_params=(100,200),
action_fc_layer_params=None,
name="ddpg_CriticNetwork"
)
agent = ddpg_agent.DdpgAgent(
time_step_spec=time_step_spec,
action_spec=action_spec,
actor_network=actor_network,
critic_network=critic_network,
actor_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
critic_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)
This is what the trajectory looks like这就是轨迹的样子
Trajectory(step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[0, 1]], dtype=int32)>, observation=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[280, 280]], dtype=int32)>, action=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[nan, 0.]])>, policy_info=(), next_step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[1, 1]], dtype=int32)>, reward=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[ -6.93147181, -12.14113521]])>, discount=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[0.9, 0.9]], dtype=float32)>)
i should be able to call agent.train(trajectory) and it would work however I get the following error:我应该能够调用 agent.train(trajectory) 并且它会工作但是我收到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-325-bf162a5dc8d7> in <module>
----> 1 agent.train(trajs[0])
~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in train(self, experience, weights)
213 "experience must be type Trajectory, saw type: %s" % type(experience))
214
--> 215 self._check_trajectory_dimensions(experience)
216
217 if self._enable_functions:
~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in _check_trajectory_dimensions(self, experience)
137 if not nest_utils.is_batched_nested_tensors(
138 experience, self.collect_data_spec,
--> 139 num_outer_dims=self._num_outer_dims):
140 debug_str_1 = tf.nest.map_structure(lambda tp: tp.shape, experience)
141 debug_str_2 = tf.nest.map_structure(lambda spec: spec.shape,
~/.local/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py in is_batched_nested_tensors(tensors, specs, num_outer_dims)
142 'And spec_shapes:\n %s' %
143 (num_outer_dims, tf.nest.pack_sequence_as(tensors, tensor_shapes),
--> 144 tf.nest.pack_sequence_as(specs, spec_shapes)))
145
146
ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs. num_outer_dims: 2.
Saw tensor_shapes:
Trajectory(step_type=TensorShape([1, 2]), observation=TensorShape([1, 2]), action=TensorShape([1, 2]), policy_info=(), next_step_type=TensorShape([1, 2]), reward=TensorShape([1, 2]), discount=TensorShape([1, 2]))
And spec_shapes:
Trajectory(step_type=TensorShape([]), observation=TensorShape([1]), action=TensorShape([1]), policy_info=(), next_step_type=TensorShape([]), reward=TensorShape([]), discount=TensorShape([]))
This can be easily solved by using the environment.这可以通过使用环境轻松解决。 In TF-Agents the environment needs to follow the
PyEnvironment
class (and then you wrap this with a TFPyEnvironment
for parallel execution of multiple envs).在 TF-Agents 中,环境需要遵循
PyEnvironment
class(然后用TFPyEnvironment
包装它以并行执行多个 env)。 If you have already defined your environment to match this class' specification then your environment should already provide you with the two methods env.time_step_spec()
and env.action_spec()
.如果你已经定义了你的环境来匹配这个类的规范,那么你的环境应该已经为你提供了两个方法
env.time_step_spec()
和env.action_spec()
。 Simply feed these two to your agent and you should be done.只需将这两个提供给您的代理,您就应该完成。
It becomes a bit more complicated if you want to have multiple outputs from your environment that don't all go into your agent.如果您想从您的环境中获得多个输出,而不是所有 go 进入您的代理,它会变得有点复杂。 In this case you will need to define an
observation_and_action_constraint_splitter
fuction to pass to your agent.在这种情况下,您需要定义一个
observation_and_action_constraint_splitter
函数以传递给您的代理。 For more details as to how to get your TensorSpecs/ArraySpecs right, and for an example of something that works see my answer here有关如何正确设置 TensorSpecs/ArraySpecs 的更多详细信息,以及可行的示例,请参见我的回答here
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.