简体繁体 English

Agent不断重复同一个动作圈，Q学习

[英]Agent repeats the same action circle non stop, Q learning

原文 2020-04-22 14:02:22 7 1 python/ tensorflow/ reinforcement-learning/ q-learning

How can you prevent the agent from non-stop repeating the same action circle?如何防止代理不停重复同一个动作圈？

Of course, somehow with changes in the reward system.当然，不知何故随着奖励系统的变化。 But are there general rules you could follow or try to include in your code to prevent such a problem?但是，您是否可以遵循或尝试在代码中包含一般规则来防止此类问题？

To be more precise, my actual problem is this one:更准确地说，我的实际问题是这个：

I'm trying to teach an ANN to learn Doodle Jump using Q-Learning.我正在尝试教 ANN 使用 Q-Learning 学习 Doodle Jump。 After only a few generations the agent keeps jumping on one and the same platform/stone over and over again, non-stop.仅仅几代之后，代理不断地在同一个平台/石头上一遍又一遍地跳跃，不停地。 It doesn't help to increase the length of the random-exploration-time.增加随机探索时间的长度没有帮助。

My reward system is the following:我的奖励系统如下：

+1 when the agent is living代理人活着时+1
+2 when the agent jumps on a platform +2 当特工在平台上跳跃时
-1000 when it dies -1000 死亡时

An idea would be to reward it negative or at least with 0 when the agent hits the same platform as it did before.一个想法是当智能体与以前一样到达相同的平台时，给予它负数或至少 0 的奖励。 But to do so, I'd have to pass a lot of new input-parameters to the ANN: x,y coordinates of the agent and x,y coordinates of the last visited platform.但要这样做，我必须向 ANN 传递许多新的输入参数：代理的 x,y 坐标和上次访问平台的 x,y 坐标。

Furthermore, the ANN then would also have to learn that a platform is 4 blocks thick, and so on.此外，ANN 还必须知道一个平台有 4 个块厚，依此类推。

Therefore, I'm sure that this idea I just mentioned wouldn't solve the problem, contrarily I believe that the ANN would in general simply not learn well anymore, because there are too many unuseful and complex-to-understand inputs.因此，我确信我刚才提到的这个想法并不能解决问题，相反，我相信 ANN 通常根本学不好，因为有太多无用且难以理解的输入。

1 个解决方案

This is not a direct answer to the very generally asked question.这不是对非常普遍提出的问题的直接答案。

I found a workaround for my particular DoodleJump example, probably someone does something similar and needs help:我为我的特定 DoodleJump 示例找到了一种解决方法，可能有人做了类似的事情并需要帮助：

While training: Let every platform the agent jumped on disappear after that, and spawn a new one somewhere else.训练时：让代理跳上的每个平台都消失，然后在其他地方生成一个新平台。
While testing/presenting: You can disable the new "disappear-feature" (so that it's like it was before again) and the player will play well and won't hop on one and the same platform all the time.测试/演示时：您可以禁用新的“消失功能”（使其再次像以前一样），播放器会玩得很好，并且不会一直跳到同一个平台上。