I am trying to train a DDPG agent with batch learning using the tf_agents library. However I require to define an observation_spec and action_spec which state the shapes of the tensors the agent would receive. I have managed to create the trajectories with which I can feed the data however these trajectories and the agent itself have mismatched shapes
I have tried changing the observation and action specs with the agent definition. This is my agent defintion:
observation_spec = TensorSpec(shape = (1,),dtype = tf.float32)
time_step_spec = time_step.time_step_spec(observation_spec)
action_spec = BoundedTensorSpec([1],tf.float32,minimum = -100, maximum = 100)
actor_network = ActorNetwork(
input_tensor_spec=observation_spec,
output_tensor_spec=action_spec,
fc_layer_params=(100,200,100),
name="ddpg_ActorNetwork"
)
critic_net_input_specs = (observation_spec, action_spec)
critic_network = CriticNetwork(
input_tensor_spec=critic_net_input_specs,
observation_fc_layer_params=(200,100),
joint_fc_layer_params=(100,200),
action_fc_layer_params=None,
name="ddpg_CriticNetwork"
)
agent = ddpg_agent.DdpgAgent(
time_step_spec=time_step_spec,
action_spec=action_spec,
actor_network=actor_network,
critic_network=critic_network,
actor_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
critic_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)
This is what the trajectory looks like
Trajectory(step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[0, 1]], dtype=int32)>, observation=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[280, 280]], dtype=int32)>, action=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[nan, 0.]])>, policy_info=(), next_step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[1, 1]], dtype=int32)>, reward=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[ -6.93147181, -12.14113521]])>, discount=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[0.9, 0.9]], dtype=float32)>)
i should be able to call agent.train(trajectory) and it would work however I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-325-bf162a5dc8d7> in <module>
----> 1 agent.train(trajs[0])
~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in train(self, experience, weights)
213 "experience must be type Trajectory, saw type: %s" % type(experience))
214
--> 215 self._check_trajectory_dimensions(experience)
216
217 if self._enable_functions:
~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in _check_trajectory_dimensions(self, experience)
137 if not nest_utils.is_batched_nested_tensors(
138 experience, self.collect_data_spec,
--> 139 num_outer_dims=self._num_outer_dims):
140 debug_str_1 = tf.nest.map_structure(lambda tp: tp.shape, experience)
141 debug_str_2 = tf.nest.map_structure(lambda spec: spec.shape,
~/.local/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py in is_batched_nested_tensors(tensors, specs, num_outer_dims)
142 'And spec_shapes:\n %s' %
143 (num_outer_dims, tf.nest.pack_sequence_as(tensors, tensor_shapes),
--> 144 tf.nest.pack_sequence_as(specs, spec_shapes)))
145
146
ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs. num_outer_dims: 2.
Saw tensor_shapes:
Trajectory(step_type=TensorShape([1, 2]), observation=TensorShape([1, 2]), action=TensorShape([1, 2]), policy_info=(), next_step_type=TensorShape([1, 2]), reward=TensorShape([1, 2]), discount=TensorShape([1, 2]))
And spec_shapes:
Trajectory(step_type=TensorShape([]), observation=TensorShape([1]), action=TensorShape([1]), policy_info=(), next_step_type=TensorShape([]), reward=TensorShape([]), discount=TensorShape([]))
This can be easily solved by using the environment. In TF-Agents the environment needs to follow the PyEnvironment
class (and then you wrap this with a TFPyEnvironment
for parallel execution of multiple envs). If you have already defined your environment to match this class' specification then your environment should already provide you with the two methods env.time_step_spec()
and env.action_spec()
. Simply feed these two to your agent and you should be done.
It becomes a bit more complicated if you want to have multiple outputs from your environment that don't all go into your agent. In this case you will need to define an observation_and_action_constraint_splitter
fuction to pass to your agent. For more details as to how to get your TensorSpecs/ArraySpecs right, and for an example of something that works see my answer here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.