[英]How to define correct shape for tf-agents in batch learning
我正在嘗試使用 tf_agents 庫來訓練具有批量學習的 DDPG 代理。 但是,我需要定義一個觀察規范和動作規范,其中 state 代理將接收的張量的形狀。 我已經設法創建了可以提供數據的軌跡,但是這些軌跡和代理本身的形狀不匹配
我嘗試使用代理定義更改觀察和操作規范。 這是我的代理定義:
observation_spec = TensorSpec(shape = (1,),dtype = tf.float32)
time_step_spec = time_step.time_step_spec(observation_spec)
action_spec = BoundedTensorSpec([1],tf.float32,minimum = -100, maximum = 100)
actor_network = ActorNetwork(
input_tensor_spec=observation_spec,
output_tensor_spec=action_spec,
fc_layer_params=(100,200,100),
name="ddpg_ActorNetwork"
)
critic_net_input_specs = (observation_spec, action_spec)
critic_network = CriticNetwork(
input_tensor_spec=critic_net_input_specs,
observation_fc_layer_params=(200,100),
joint_fc_layer_params=(100,200),
action_fc_layer_params=None,
name="ddpg_CriticNetwork"
)
agent = ddpg_agent.DdpgAgent(
time_step_spec=time_step_spec,
action_spec=action_spec,
actor_network=actor_network,
critic_network=critic_network,
actor_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
critic_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)
這就是軌跡的樣子
Trajectory(step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[0, 1]], dtype=int32)>, observation=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[280, 280]], dtype=int32)>, action=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[nan, 0.]])>, policy_info=(), next_step_type=<tf.Variable 'Variable:0' shape=(1, 2) dtype=int32, numpy=array([[1, 1]], dtype=int32)>, reward=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float64, numpy=array([[ -6.93147181, -12.14113521]])>, discount=<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[0.9, 0.9]], dtype=float32)>)
我應該能夠調用 agent.train(trajectory) 並且它會工作但是我收到以下錯誤:
ValueError Traceback (most recent call last)
<ipython-input-325-bf162a5dc8d7> in <module>
----> 1 agent.train(trajs[0])
~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in train(self, experience, weights)
213 "experience must be type Trajectory, saw type: %s" % type(experience))
214
--> 215 self._check_trajectory_dimensions(experience)
216
217 if self._enable_functions:
~/.local/lib/python3.7/site-packages/tf_agents/agents/tf_agent.py in _check_trajectory_dimensions(self, experience)
137 if not nest_utils.is_batched_nested_tensors(
138 experience, self.collect_data_spec,
--> 139 num_outer_dims=self._num_outer_dims):
140 debug_str_1 = tf.nest.map_structure(lambda tp: tp.shape, experience)
141 debug_str_2 = tf.nest.map_structure(lambda spec: spec.shape,
~/.local/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py in is_batched_nested_tensors(tensors, specs, num_outer_dims)
142 'And spec_shapes:\n %s' %
143 (num_outer_dims, tf.nest.pack_sequence_as(tensors, tensor_shapes),
--> 144 tf.nest.pack_sequence_as(specs, spec_shapes)))
145
146
ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs. num_outer_dims: 2.
Saw tensor_shapes:
Trajectory(step_type=TensorShape([1, 2]), observation=TensorShape([1, 2]), action=TensorShape([1, 2]), policy_info=(), next_step_type=TensorShape([1, 2]), reward=TensorShape([1, 2]), discount=TensorShape([1, 2]))
And spec_shapes:
Trajectory(step_type=TensorShape([]), observation=TensorShape([1]), action=TensorShape([1]), policy_info=(), next_step_type=TensorShape([]), reward=TensorShape([]), discount=TensorShape([]))
這可以通過使用環境輕松解決。 在 TF-Agents 中,環境需要遵循PyEnvironment
class(然后用TFPyEnvironment
包裝它以並行執行多個 env)。 如果你已經定義了你的環境來匹配這個類的規范,那么你的環境應該已經為你提供了兩個方法env.time_step_spec()
和env.action_spec()
。 只需將這兩個提供給您的代理,您就應該完成。
如果您想從您的環境中獲得多個輸出,而不是所有 go 進入您的代理,它會變得有點復雜。 在這種情況下,您需要定義一個observation_and_action_constraint_splitter
函數以傳遞給您的代理。 有關如何正確設置 TensorSpecs/ArraySpecs 的更多詳細信息,以及可行的示例,請參見我的回答here
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.