MCTS 代理在井字游戲上做出錯誤決定

Question

我已經在 MCTS AI 上工作了幾天了。 我試圖在井字游戲上實現它，這是我能想到的最簡單的游戲，但出於某種原因，我的人工智能總是做出錯誤的決定。 我已經嘗試更改 UCB1 的探索常數的值、每次搜索的迭代次數，甚至是獲勝、失敗和平局所獲得的分數（試圖讓平局更有回報，因為這個 AI 只打第二，並嘗試平局，否則獲勝）。 截至目前，代碼如下所示：

import random
import math
import copy
class tree:
    def __init__(self, board):
        self.board = board
        self.visits = 0
        self.score = 0
        self.children = []
class mcts:
    def search(self, mx, player,):
        root = tree(mx)
        for i in range(1200):
            leaf = mcts.expand(self, root.board, player, root)
            result = mcts.rollout(self, leaf)
            mcts.backpropagate(self, leaf, root, result)
        return mcts.best_child(self, root).board

    def expand(self, mx, player, root):
        plays = mcts.generate_states(self, mx, player) #all possible plays
        if root.visits == 0:
            for j in plays:
                root.children.append(j) #create child_nodes in case they havent been created yet
        for j in root.children:
            if j.visits == 0:
                return j #first iterations of the loop
        for j in plays:
            if mcts.final(self, j.board, player):
                return j
        return mcts.best_child(self, root) #choose the one with most potential

    def rollout(self, leaf):
        mx = leaf.board
        aux = 1
        while mcts.final(self, mx, "O") != True:
            if aux == 1: # "X" playing
                possible_states = []
                possible_nodes = mcts.generate_states(self, mx, "X")
                for i in possible_nodes:
                    possible_states.append(i.board)
                if len(possible_states) == 1: mx =  possible_states[0]
                else:
                    choice = random.randrange(0, len(possible_states) - 1)
                    mx = possible_states[choice]
                if mcts.final(self, mx, "X"): #The play by "X" finished the game
                    break
            elif aux == 0: # "O" playing
                possible_states = []
                possible_nodes = mcts.generate_states(self, mx, "O")
                for i in possible_nodes:
                    possible_states.append(i.board)
                if len(possible_states) == 1: mx =  possible_states[0]
                else:
                    choice = random.randrange(0, len(possible_states) - 1)
                    mx = possible_states[choice]
            aux += 1
            aux = aux%2
        if mcts.final(self, mx, "X"):
            for i in range(len(mx)):
                for k in range(len(mx[i])):
                    if mx[i][k] == "-":
                        return -1 #loss
            return 0 #tie
        elif mcts.final(self, mx, "O"):
            for i in range(len(mx)):
                for k in range(len(mx[i])):
                    if mx[i][k] == "-":
                        return 1 #win


    def backpropagate(self, leaf, root, result): # updating our prospects stats
        leaf.score += result
        leaf.visits += 1
        root.visits += 1

    def generate_states(self, mx, player):
        possible_states = [] #generate child_nodes
        for i in range(len(mx)):
            for k in range(len(mx[i])):
                if mx[i][k] == "-":
                    option = copy.deepcopy(mx)
                    option[i][k] = player
                    child_node = tree(option)
                    possible_states.append(child_node)
        return possible_states

    def final(self,mx, player): #check if game is won
        possible_draw = True
        win = False
        for i in mx: #lines
            if i == [player, player, player]:
                win = True
                possible_draw = False
        if mx[0][0] == player: #diagonals
            if mx[1][1] == player:
                if mx[2][2] == player:
                    win = True
                    possible_draw = False
        if mx[0][2] == player:
            if mx[1][1] == player:
                if mx[2][0] == player:
                    win = True
                    possible_draw = False
        for i in range(3): #columns
            if mx[0][i] == player and mx[1][i] == player and mx[2][i] == player:
                win = True
                possible_draw = False
        for i in range(3):
            for k in range(3):
                if mx[i][k] == "-":
                    possible_draw = False
        if possible_draw:
            return possible_draw
        return win

    def calculate_score(self, score, child_visits, parent_visits, c): #UCB1
        return score / child_visits + c * math.sqrt(math.log(parent_visits) / child_visits)

    def best_child(self, root): #returns most promising node
        treshold = -1*10**6
        for j in root.children:
            potential = mcts.calculate_score(self, j.score, j.visits, root.visits, 2)
            if potential > treshold:
                win_choice = j
                treshold = potential
        return win_choice

#todo the AI takes too long for each play, optimize that by finding the optimal approach in the rollout phase

首先，這個 AI 的目的是返回一個改變的矩陣，在這種情況下他可以做出最好的發揮。 我發現自己質疑 MCTS 算法是否是所有這些失敗游戲背后的原因，因為它的實現中可能存在一些錯誤。 話雖如此，在我看來，代碼執行以下操作：

檢查根是否已經有它的孩子，如果有，選擇最有希望的。
展開隨機模擬並保存結果。
更新葉子的分數、訪問次數和根的訪問次數。
在我的示例中重復 1200 次迭代
返回可能的最佳移動（矩陣，child_node）。

為什么它不起作用？ 為什么選擇糟糕的游戲而不是最佳的游戲？ 算法是否執行錯誤？

Answer 1

我的錯誤是在擴展階段選擇了訪問次數最多的節點，而根據 UCB1 公式，它應該是最具潛力的節點。 在執行一些 if 子句時，我也遇到了一些錯誤，因為所有的損失都沒有被計算在內。

MCTS 代理在井字游戲上做出錯誤決定

問題描述

1 個解決方案

解決方案1
0 已采納 2021-02-08 17:22:37

MCTS 代理在井字游戲上做出錯誤決定

問題描述

1 個解決方案

解決方案1 0 已采納 2021-02-08 17:22:37

解決方案1
0 已采納 2021-02-08 17:22:37