简体   繁体   English

Q学习ludo游戏吗?

[英]Q learning for ludo game?

I am at the moment trying to implement a AI player using Q-learning to play against 2 different random players.. 我目前正在尝试使用Q学习来实现一个AI玩家来与2个不同的随机玩家对战。

I am not sure Q-learning is applicable for a ludo game, which why I am being bit doubtful about it.. 我不确定Q学习是否适用于Ludo游戏,这就是为什么我对此有所怀疑的原因。

I have for the game defined 11 states. 我为游戏定义了11个状态。 Each state is defined according to the position of the other players. 每个状态都是根据其他玩家的位置定义的。

My possible actions is 6, (constrained by the dice). 我可能采取的行动是6(受骰子限制)。

Theoretically i could have four different states (One for each Ludo Token) Which can perform the action chosen by the dice, but I would just choose to move the token which has the highest Q(s,a) and peform the action.. 从理论上讲,我可以有四个不同的状态(每个Ludo令牌一个),它们可以执行骰子选择的动作,但是我只选择移动具有最高Q(s,a)的令牌并执行该动作。

What i don't get is, what will happen at the update phase. 我不明白的是,更新阶段会发生什么。

I understand I update the previous value, with the new value?.. 我知道我用新值更新了以前的值吗?

Based from wiki is the update given as this: 来自wiki的更新如下:

在此处输入图片说明

What I don't get is how the reward value is different from the old value? 我没有得到的是奖励价值与旧价值有何不同? How is it defined and how is it different for those values in the matrix? 矩阵中的这些值如何定义以及有何不同?

The reward is the reward given for making a certain move, and the old q-value is the the value in the q-table that was chosen as the action, was the most attractive in the given state. 奖励是做出特定动作所获得的奖励,而旧的q值是在给定状态下被选择为动作的q表中的值最有吸引力的值。 The reward here will update that entry, such that the algorithm will in the future know if either the move was benefitted or made the outcome worser. 此处的奖励将更新该条目,以便算法将来会知道此举是受益还是使结果更糟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM