简体繁体 English

纸牌游戏 Schnapsen 的深度 Q 学习方法

[英]Deep Q Learning Approach for the card game Schnapsen

原文 2023-01-23 13:04:13 6 1 python/ keras/ deep-learning/ q-learning

So I have a DQN Agent that plays the card game Schnapsen.所以我有一个玩纸牌游戏 Schnapsen 的 DQN 代理。 I wont bore you with the details of the game as they are not so related to the question I am about to ask.我不会让您厌烦游戏的细节，因为它们与我要问的问题没有太大关系。 The only important point is that for every round of the game, there are specific valid moves a player can take.唯一重要的一点是，对于每一轮游戏，玩家都可以采取特定的有效动作。 The DQN Agent I have created sometime outputs non-valid moves, in the form of an integer. There are 28 possible moves in the entire game, so sometimes it will output a move that cannot be played based on the current state of the game, for example playing the Jack of Diamonds when it is not in its hand.我创建的 DQN Agent 有时会输出无效的着法，形式为 integer。整个游戏中有 28 种可能的着法，因此有时它会 output 根据当前游戏的 state 无法下的着法，例如，当它不在手中时玩方块杰克。 I was wondering if there was any way for me to "map" the outputs of the neural.network into the most similar move in the case that it does not converge?我想知道在神经网络不收敛的情况下，是否有任何方法可以将 neural.network 的输出“映射”为最相似的动作？ Would that be the best approach to this problem or do I have to tune the neural.network better?这是解决这个问题的最佳方法还是我必须更好地调整 neural.network？

As of right now, whenever the DQN Agent does not output a valid move, it falls on to another algorithm, a Bully Bot implementation that plays one of the possible valid moves.截至目前，只要 DQN 代理没有 output 有效着法，它就会转向另一种算法，即 Bully Bot 实施，它会执行其中一个可能的有效着法。 Here is the link to my github repo with the code.这是我的 github 代码库的链接。 To run the code where the DQN Agent plays against a bully bot, just navigate into the executables file and run: python cli.py bully-bot要运行 DQN 代理对抗 bully bot 的代码，只需导航到可执行文件并运行： python cli.py bully-bot

1 个解决方案

One approach to mapping the outputs of your neural.network to the most similar valid move would be to use "softmax" to convert the raw outputs of the.network into a probability distribution over the possible moves.将 neural.network 的输出映射到最相似的有效移动的一种方法是使用“softmax”将 .network 的原始输出转换为可能移动的概率分布。 Then, you could select the move with the highest probability that is also a valid move.然后，您可以 select 具有最高概率的移动也是有效移动。 Another approach could be to use "argmax" which returns the index of the maximum value in the output. Then you will have to check whether the returned index corresponds to a valid move or not.另一种方法是使用“argmax”，它返回 output 中最大值的索引。然后您将必须检查返回的索引是否对应于有效移动。 If not, you can select the next possible index which corresponds to a valid move.如果不是，您可以 select 对应于有效移动的下一个可能索引。