简体   繁体   English

TF-agents - 重播缓冲区将轨迹添加到批次形状不匹配

[英]TF-agents - Replay buffer add trajectory to batch shape mismatch

I'm posting a question that was posted by another user and then deleted.我发布了一个由另一个用户发布然后被删除的问题。 I had the same question, and I found an answer.我有同样的问题,我找到了答案。 The original question:原来的问题:

I am currently trying to implement a categorical DQN following this tutorial: https://www.tensorflow.org/agents/tutorials/9_c51_tutorial我目前正在尝试按照本教程实施分类 DQN: https://www.tensorflow.org/agents/tutorials/9_c51_tutorial

The following part is giving me a bit of a headache though:以下部分让我有点头疼:

random_policy = random_tf_policy.RandomTFPolicy(env.time_step_spec(),
                                                env.action_spec())

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
data_spec=agent.collect_data_spec,
batch_size=1,
max_length=replay_buffer_capacity) # this is 100

# ...

def collect_step(environment, policy):
  time_step = environment.current_time_step()
  action_step = policy.action(time_step)
  next_time_step = environment.step(action_step.action)
  traj = trajectory.from_transition(time_step, action_step, next_time_step)
  print(traj)

  # Add trajectory to the replay buffer
  replay_buffer.add_batch(traj)

for _ in range(initial_collect_steps):
  collect_step(env, random_policy)

For context: agent.collect_data_spec is of the following shape:对于上下文: agent.collect_data_spec具有以下形状:

Trajectory(step_type=TensorSpec(shape=(), dtype=tf.int32, name='step_type'), observation=BoundedTensorSpec(shape=(4, 84, 84), dtype=tf.float32, name='screen', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)), action=BoundedTensorSpec(shape=(), dtype=tf.int32, name='play', minimum=array(0), maximum=array(6)), policy_info=(), next_step_type=TensorSpec(shape=(), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(), dtype=tf.float32, name='reward'), discount=BoundedTensorSpec(shape=(), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)))

And here is what a sample traj looks like:这是一个示例 traj 的样子:

Trajectory(step_type=<tf.Tensor: shape=(), dtype=int32, numpy=0>, observation=<tf.Tensor: shape=(4, 84, 84), dtype=float32, numpy=array([tensor contents omitted], dtype=float32)>, action=<tf.Tensor: shape=(), dtype=int32, numpy=1>, policy_info=(), next_step_type=<tf.Tensor: shape=(), dtype=int32, numpy=1>, reward=<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, discount=<tf.Tensor: shape=(), dtype=float32, numpy=1.0>)

So, everything should check out, right?所以,一切都应该检查,对吧? The environment outputs a tensor of shape [4, 84, 84], same as the replay buffer expects.环境输出一个形状为 [4, 84, 84] 的张量,与回放缓冲区所期望的相同。 Except I'm getting the following error:除了我收到以下错误:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Must have updates.shape = indices.shape + params.shape[1:] or updates.shape = [], got updates.shape [4,84,84], indices.shape [1], params.shape [100,4,84,84] [Op:ResourceScatterUpdate]

Which suggests that it is actually expecting a tensor of shape [1, 4, 84, 84] .这表明它实际上期望一个形状为[1, 4, 84, 84]的张量。 The thing is though, if I have my environment output a tensor of that shape, I then get another error message telling me that the output shape doesn't match the spec shape (duh).问题是,如果我的环境 output 是该形状的张量,然后我会收到另一条错误消息,告诉我 output 形状与规格形状不匹配(呃)。 And if I then adjust the spec shape to be [1, 4, 84, 84] , suddenly the replay buffer expects a shape of [1, 1, 4, 84, 84] , and so on...如果我然后将规范形状调整为[1, 4, 84, 84] ,突然回放缓冲区需要[1, 1, 4, 84, 84]的形状,依此类推......

Finally, for completion, here you have the time_step_spec and action_spec of my environment respectively:最后,为了完成,这里分别有我的环境的time_step_specaction_spec

TimeStep(step_type=TensorSpec(shape=(), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(), dtype=tf.float32, name='reward'), discount=BoundedTensorSpec(shape=(), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)), observation=BoundedTensorSpec(shape=(4, 84, 84), dtype=tf.float32, name='screen', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)))
---
BoundedTensorSpec(shape=(), dtype=tf.int32, name='play', minimum=array(0), maximum=array(6))

I've tried pretty much the better half of today trying to get the tensor to fit properly, but you cannot reshape it since it's an attribute so in a last ditch effort I'm hoping maybe some kind stranger out there can tell me what the heck is going on here.我已经尝试了今天的大部分时间,试图让张量正确拟合,但你不能重塑它,因为它是一个属性,所以在最后的努力中,我希望也许有一些陌生人可以告诉我哎呀,这里发生了。

Thank you in advance!先感谢您!

It seems that in the collect_step function, traj is aa single trajectory, not a batch.似乎在collect_step function 中, traj是一个单一的轨迹,而不是一个批次。 Therefore you need to expand the dimensions into a batch and then use it.因此,您需要将维度扩展为一个批次,然后使用它。 Note that you can't just do tf.expand_dims(traj, 0) .请注意,您不能只做tf.expand_dims(traj, 0) There's a helper function for doing it for nested structures.有一个助手 function 用于嵌套结构。

def collect_step(environment, policy):
    time_step = environment.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = environment.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)
    batch = tf.nest.map_structure(lambda t: tf.expand_dims(t, 0), traj)
    # Add trajectory to the replay buffer
    replay_buffer.add_batch(batch)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM