简体   繁体   中英

How to model UNO as a POMDP

I am trying to model UNO card game as Partially Observable Markov Decision Processes(POMDPs) . I did little bit of research, and came to conclusion that, the states will be the number of cards, the actions will be either to play or pick the card from unseen card deck. I am facing difficulty in formulating the state transition and observation model. I think, that observation model will depend on past actions and observation(History), but for that I need to relax Markov Assumption. I want to know that relaxing the Markov Assumption is better choice or not? Additionally, how exactly should I form the state and observation model.Thanks in advance.

I think in a POMDP the states should still be the "full truth" (position of all the cards) and the transitions are simply the rules of the game (including the strategy of the other players?!). The observations should certainly not depend on any history, only on the state, or else you're violating the Markov assumption. The point of a POMDP is that the agent can gain information about the current state by analyzing history. I'm not really sure if or how this applies to UNO, though. If you know which cards have been played and their order, can you still gain information by using the history? Probably not. Not sure, but maybe it does not make sense to think of this game as a POMDP, even if you use a solution that was designed for a POMDP.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM