如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError？

Question

I'm working on an PPO agent that plays (well, should) Doom using TF-Agents.我正在开发一个使用 TF-Agents 播放（嗯，应该）Doom 的 PPO 代理。 As input to the agent, I am trying to give it a stack of 4 images.作为代理的输入，我试图给它一堆 4 张图像。 My complete code is in the following link: https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing我的完整代码在以下链接中： https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing

Unhappily, my code does not compile.不幸的是，我的代码无法编译。 It returns a TypeError in the line shown below (it is being run in Google Colaboratory).它在下面显示的行中返回一个 TypeError（它正在 Google Colaboratory 中运行）。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-d1571cbbda6b> in <module>()
      8   t_step = tf_env.reset()
      9   while (episode_steps <= max_steps_per_episode or (not t_step.is_last())):
---> 10     policy_step = agent.policy.action(t_step)
     11     t_step = tf_env.step(policy_step.action)
     12     episode_steps += 1

5 frames
/usr/local/lib/python3.7/dist-packages/tf_agents/utils/nest_utils.py in assert_same_structure(nest1,     nest2, check_types, expand_composites, message)
    112     str2 = tf.nest.map_structure(
    113         lambda _: _DOT, nest2, expand_composites=expand_composites)
--> 114     raise exception('{}:\n  {}\nvs.\n  {}'.format(message, str1, str2))
    115 
    116 

TypeError: policy_state and policy_state_spec structures do not match:
  ()
vs.
  {'actor_network_state': ListWrapper([., .])}

The thing about this error is, for what I've read in the TF-Agents documentation, the user is not supposed to do anything regarding the policy_state since it is generated automatically based on the agent's networks.关于这个错误的事情是，对于我在 TF-Agents 文档中读到的内容，用户不应该对 policy_state 做任何事情，因为它是根据代理的网络自动生成的。

This is a similar error I found, but didn't seem to solve my problem, though it hinted me in one of the tryed solutions: py_environment 'time_step' doesn't match 'time_step_spec'这是我发现的一个类似错误，但似乎并没有解决我的问题，尽管它在一个尝试过的解决方案中提示了我： py_environment 'time_step' doesn't match 'time_step_spec'

After reading the question and the answer above, I realized I was promising an observation_spec like this:在阅读了上面的问题和答案后，我意识到我承诺了一个像这样的观察规范：

self._observation_spec = array_spec.BoundedArraySpec(shape=(4, 160, 260, 3), dtype=np.float32, minimum=0, maximum=1, name='screen_observation')

But what I was passing was a list of 4 np.arrays with shape = (160, 260, 3):但我传递的是 4 个 np.arrays 的列表，形状 = (160, 260, 3)：

self._stacked_frames = []
for _ in range(4):
  new_frame = np.zeros((160, 260, 3), dtype=np.float32)
  self._stacked_frames.append(new_frame)

I did this because I thought the "shape" of my data wouldn't change, since the list always has the same number of elements as the first dimension of the observation_spec.我这样做是因为我认为我的数据的“形状”不会改变，因为列表始终具有与观察规范的第一个维度相同数量的元素。 Lists were easier to delete past frames and add new ones, like this:列表更容易删除过去的帧并添加新的帧，如下所示：

def stack_frames(self):
  #This gets the current frame of the game
  new_frame = self.preprocess_frame()

  if self._game.is_new_episode():
    for frame in range(4):
      self._stacked_frames.append(new_frame)
      #This pop was meant to clear an empty frames that was already in the list
      self._stacked_frames.pop(0)
  else:
    self._stacked_frames.append(new_frame)
    self._stacked_frames.pop(0)
  return self._stacked_frames

I was trying with only np.arrays before, but was not able to delete past frames and add new ones.我之前只尝试过 np.arrays ，但无法删除过去的帧并添加新的帧。 Probably I was not doing it right, but I felt like the self._stacked_frames was born with the same shape as the observation spec and could not simply delete or add new arrays.可能我做得不对，但我觉得 self._stacked_frames 与观察规范的形状相同，不能简单地删除或添加新的 arrays。

self._stacked_frames = np.zeros((4, 160, 260, 3), dtype=np.float32)

def stack_frames(self):
  new_frame = self.preprocess_frame()
  
  if self._game.is_new_episode():
    for frame in range(4):
      #This delete was meant to clear an empty frames that was already in the list
      self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
      #I tried "np.concatenate((self._stacked_frames, new_frame))" as well
      self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
  else:
    self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
    #I tried "np.concatenate((self._stacked_frames, new_frame))" as well
    self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
  return self._stacked_frames

This approach up here did not work.这种方法在这里不起作用。 Like I said, probably I was doing it wrong.就像我说的，可能我做错了。 I see three ways of solving this stalemate:我看到了解决这种僵局的三种方法：

I declare the observation_spec as a list of four frames, each declared as np.array(160, 260, 3);我将观察规范声明为四个帧的列表，每个帧都声明为 np.array(160, 260, 3);
I declared the observation_spec like I did, but delete and add frames from the self._stacked_frames the right way (not sure it is possible, since self._stacked_frames will be declared as np.array(4, 160, 260, 3) and I'm not sure it can become np.array(3, 160, 260, 3) or np.array(5, 160, 260, 3), before going back to being np.array(4, 160, 260, 3);我像我一样声明了观察规范，但是以正确的方式从 self._stacked_frames 中删除和添加帧（不确定是否可能，因为 self._stacked_frames 将被声明为 np.array(4, 160, 260, 3) 并且我'不确定它可以变成 np.array(3, 160, 260, 3) 或 np.array(5, 160, 260, 3)，然后再变成 np.array(4, 160, 260, 3) ;
I still declare the observation_spec like I did, but I do not delete or add frames.我仍然像我一样声明observation_spec，但我不删除或添加框架。 I make a loop where I copy the second frame (that enters the stack_frames function in the second slot) into the first slot, the third frame into the second slot, the fourth frame into the third slot, and finally, the new frame into the fourth slot.我创建了一个循环，将第二帧（进入第二个插槽中的 stack_frames function）复制到第一个插槽，将第三帧复制到第二个插槽，将第四帧复制到第三个插槽，最后将新帧复制到第四个插槽。 An illustration follows:如下图：

             self._stacked_frames Slot: 1 | 2 | 3 | 4
Game image inside self._stacked_frames: A | B | C | D
                        New game image: E
   New game image's positions (step 1): B | B | C | D
   New game image's positions (step 2): B | C | C | D
   New game image's positions (step 3): B | C | D | D
   New game image's positions (step 4): B | C | D | E
              New self._stacked_frames: B | C | D | E

This last one seemed like the most certain way to work around my problem, considering I'm right about what it is.考虑到我是对的，这最后一个似乎是解决我的问题的最确定的方法。 I tried it, but the TypeError persisted.我试过了，但 TypeError 仍然存在。 I tried it like this:我试过这样：

self._stacked_frames = np.zeros((self._frame_stack_size, 160, 260, 3), dtype=np.float32)

and then:接着：

def stack_frames(self):
  new_frame = self.preprocess_frame()

  if self._game.is_new_episode():
    for frame in range(self._frame_stack_size):
      self._stacked_frames[frame] = new_frame
  else:
    for frame in range((self._frame_stack_size) - 1):
      self._stacked_frames[frame] = self._stacked_frames[frame + 1]
    self._stacked_frames[self._frame_stack_size - 1] = new_frame
  return self._stacked_frames

Two questions then:那么两个问题：

Considering I'm right about the TypeError presented, what of the three ways of fixing it is best?考虑到我对所提出的 TypeError 的看法是正确的，那么修复它的三种方法中哪一种最好？ Is there anything wrong then with the way I tryed my solution for the 3rd possibility?那么我尝试第三种可能性的解决方案的方式有什么问题吗？
Considering I might not be right about the TypeError, what is this error about then?考虑到我可能对 TypeError 不正确，那么这个错误是什么？

Answer 1

I had the same issue and it was when calling policy.action(time_step) .我有同样的问题，它是在调用policy.action(time_step)时。 Action takes an optional parameter policy_state , which is by default "()". Action 采用可选参数policy_state ，默认为“()”。

I fixed the issue by calling我通过调用解决了这个问题

policy.action(time_step, policy.get_initial_state(batch_size=BATCH_SIZE))

I'm just starting with TF-Agents, so, I hope this has not some undesired effects.我刚开始使用 TF-Agents，所以，我希望这不会产生一些不良影响。

如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-22 23:24:47

如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-22 23:24:47

解决方案1
1 已采纳 2021-03-22 23:24:47