简体   繁体   English

如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError?

[英]How to fix a TypeError between policy_state and policy_state_spec in TF-Agents?

I'm working on an PPO agent that plays (well, should) Doom using TF-Agents.我正在开发一个使用 TF-Agents 播放(嗯,应该)Doom 的 PPO 代理。 As input to the agent, I am trying to give it a stack of 4 images.作为代理的输入,我试图给它一堆 4 张图像。 My complete code is in the following link: https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing我的完整代码在以下链接中: https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing

Unhappily, my code does not compile.不幸的是,我的代码无法编译。 It returns a TypeError in the line shown below (it is being run in Google Colaboratory).它在下面显示的行中返回一个 TypeError(它正在 Google Colaboratory 中运行)。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-d1571cbbda6b> in <module>()
      8   t_step = tf_env.reset()
      9   while (episode_steps <= max_steps_per_episode or (not t_step.is_last())):
---> 10     policy_step = agent.policy.action(t_step)
     11     t_step = tf_env.step(policy_step.action)
     12     episode_steps += 1

5 frames
/usr/local/lib/python3.7/dist-packages/tf_agents/utils/nest_utils.py in assert_same_structure(nest1,     nest2, check_types, expand_composites, message)
    112     str2 = tf.nest.map_structure(
    113         lambda _: _DOT, nest2, expand_composites=expand_composites)
--> 114     raise exception('{}:\n  {}\nvs.\n  {}'.format(message, str1, str2))
    115 
    116 

TypeError: policy_state and policy_state_spec structures do not match:
  ()
vs.
  {'actor_network_state': ListWrapper([., .])}

The thing about this error is, for what I've read in the TF-Agents documentation, the user is not supposed to do anything regarding the policy_state since it is generated automatically based on the agent's networks.关于这个错误的事情是,对于我在 TF-Agents 文档中读到的内容,用户不应该对 policy_state 做任何事情,因为它是根据代理的网络自动生成的。

This is a similar error I found, but didn't seem to solve my problem, though it hinted me in one of the tryed solutions: py_environment 'time_step' doesn't match 'time_step_spec'这是我发现的一个类似错误,但似乎并没有解决我的问题,尽管它在一个尝试过的解决方案中提示了我: py_environment 'time_step' doesn't match 'time_step_spec'

After reading the question and the answer above, I realized I was promising an observation_spec like this:在阅读了上面的问题和答案后,我意识到我承诺了一个像这样的观察规范:

self._observation_spec = array_spec.BoundedArraySpec(shape=(4, 160, 260, 3), dtype=np.float32, minimum=0, maximum=1, name='screen_observation')

But what I was passing was a list of 4 np.arrays with shape = (160, 260, 3):但我传递的是 4 个 np.arrays 的列表,形状 = (160, 260, 3):

self._stacked_frames = []
for _ in range(4):
  new_frame = np.zeros((160, 260, 3), dtype=np.float32)
  self._stacked_frames.append(new_frame)

I did this because I thought the "shape" of my data wouldn't change, since the list always has the same number of elements as the first dimension of the observation_spec.我这样做是因为我认为我的数据的“形状”不会改变,因为列表始终具有与观察规范的第一个维度相同数量的元素。 Lists were easier to delete past frames and add new ones, like this:列表更容易删除过去的帧并添加新的帧,如下所示:

def stack_frames(self):
  #This gets the current frame of the game
  new_frame = self.preprocess_frame()

  if self._game.is_new_episode():
    for frame in range(4):
      self._stacked_frames.append(new_frame)
      #This pop was meant to clear an empty frames that was already in the list
      self._stacked_frames.pop(0)
  else:
    self._stacked_frames.append(new_frame)
    self._stacked_frames.pop(0)
  return self._stacked_frames

I was trying with only np.arrays before, but was not able to delete past frames and add new ones.我之前只尝试过 np.arrays ,但无法删除过去的帧并添加新的帧。 Probably I was not doing it right, but I felt like the self._stacked_frames was born with the same shape as the observation spec and could not simply delete or add new arrays.可能我做得不对,但我觉得 self._stacked_frames 与观察规范的形状相同,不能简单地删除或添加新的 arrays。

self._stacked_frames = np.zeros((4, 160, 260, 3), dtype=np.float32)

def stack_frames(self):
  new_frame = self.preprocess_frame()
  
  if self._game.is_new_episode():
    for frame in range(4):
      #This delete was meant to clear an empty frames that was already in the list
      self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
      #I tried "np.concatenate((self._stacked_frames, new_frame))" as well
      self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
  else:
    self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
    #I tried "np.concatenate((self._stacked_frames, new_frame))" as well
    self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
  return self._stacked_frames

This approach up here did not work.这种方法在这里不起作用。 Like I said, probably I was doing it wrong.就像我说的,可能我做错了。 I see three ways of solving this stalemate:我看到了解决这种僵局的三种方法:

  1. I declare the observation_spec as a list of four frames, each declared as np.array(160, 260, 3);我将观察规范声明为四个帧的列表,每个帧都声明为 np.array(160, 260, 3);
  2. I declared the observation_spec like I did, but delete and add frames from the self._stacked_frames the right way (not sure it is possible, since self._stacked_frames will be declared as np.array(4, 160, 260, 3) and I'm not sure it can become np.array(3, 160, 260, 3) or np.array(5, 160, 260, 3), before going back to being np.array(4, 160, 260, 3);我像我一样声明了观察规范,但是以正确的方式从 self._stacked_frames 中删除和添加帧(不确定是否可能,因为 self._stacked_frames 将被声明为 np.array(4, 160, 260, 3) 并且我'不确定它可以变成 np.array(3, 160, 260, 3) 或 np.array(5, 160, 260, 3),然后再变成 np.array(4, 160, 260, 3) ;
  3. I still declare the observation_spec like I did, but I do not delete or add frames.我仍然像我一样声明observation_spec,但我不删除或添加框架。 I make a loop where I copy the second frame (that enters the stack_frames function in the second slot) into the first slot, the third frame into the second slot, the fourth frame into the third slot, and finally, the new frame into the fourth slot.我创建了一个循环,将第二帧(进入第二个插槽中的 stack_frames function)复制到第一个插槽,将第三帧复制到第二个插槽,将第四帧复制到第三个插槽,最后将新帧复制到第四个插槽。 An illustration follows:如下图:
             self._stacked_frames Slot: 1 | 2 | 3 | 4
Game image inside self._stacked_frames: A | B | C | D
                        New game image: E
   New game image's positions (step 1): B | B | C | D
   New game image's positions (step 2): B | C | C | D
   New game image's positions (step 3): B | C | D | D
   New game image's positions (step 4): B | C | D | E
              New self._stacked_frames: B | C | D | E

This last one seemed like the most certain way to work around my problem, considering I'm right about what it is.考虑到我是对的,这最后一个似乎是解决我的问题的最确定的方法。 I tried it, but the TypeError persisted.我试过了,但 TypeError 仍然存在。 I tried it like this:我试过这样:

self._stacked_frames = np.zeros((self._frame_stack_size, 160, 260, 3), dtype=np.float32)

and then:接着:

def stack_frames(self):
  new_frame = self.preprocess_frame()

  if self._game.is_new_episode():
    for frame in range(self._frame_stack_size):
      self._stacked_frames[frame] = new_frame
  else:
    for frame in range((self._frame_stack_size) - 1):
      self._stacked_frames[frame] = self._stacked_frames[frame + 1]
    self._stacked_frames[self._frame_stack_size - 1] = new_frame
  return self._stacked_frames

Two questions then:那么两个问题:

  1. Considering I'm right about the TypeError presented, what of the three ways of fixing it is best?考虑到我对所提出的 TypeError 的看法是正确的,那么修复它的三种方法中哪一种最好? Is there anything wrong then with the way I tryed my solution for the 3rd possibility?那么我尝试第三种可能性的解决方案的方式有什么问题吗?
  2. Considering I might not be right about the TypeError, what is this error about then?考虑到我可能对 TypeError 不正确,那么这个错误是什么?

I had the same issue and it was when calling policy.action(time_step) .我有同样的问题,它是在调用policy.action(time_step)时。 Action takes an optional parameter policy_state , which is by default "()". Action 采用可选参数policy_state ,默认为“()”。

I fixed the issue by calling我通过调用解决了这个问题

policy.action(time_step, policy.get_initial_state(batch_size=BATCH_SIZE))

I'm just starting with TF-Agents, so, I hope this has not some undesired effects.我刚开始使用 TF-Agents,所以,我希望这不会产生一些不良影响。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 anaconda 中安装 tf-agents - How to install tf-agents in anaconda Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状 - Shape of _observation_spec and shape of _action_spec in the Tf-agents environments example 如何在 Tensor Flow 中调整 tf-agents 和策略的超参数? - How to tune hyperparameters of tf-agents and policies in Tensor Flow? 如何在批量学习中为 tf-agents 定义正确的形状 - How to define correct shape for tf-agents in batch learning 如何获得 tf-agents 中所有动作的概率向量? - How to get probability vector for all actions in tf-agents? 如何在 Tf-agents 中传递自定义环境的批量大小 - How to pass the batchsize for a custom environment in Tf-agents 使用 PyEnvironment TF-Agents 包装自定义健身房 - Wrapping custom gym with PyEnvironment TF-Agents TF-Agents 错误:TypeError:两个结构不匹配:Trajectory vs. Trajectory - TF-Agents error: TypeError: The two structures do not match: Trajectory vs. Trajectory Tf-Agents ParallelPyEnvironment静默失败 - Tf-Agents ParallelPyEnvironment fails silently 如何在 python 的大查询中存储 tf-agents 的轨迹对象并将其作为轨迹对象检索回来 - How to store tf-agents' trajectory object in big query from python and retrieve it back as the trajectory object
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM