简体   繁体   English

蒙特卡罗树搜索不起作用

[英]Monte-Carlo-Tree Search not working

I am currently writing an AI for the board game Hex . 我目前正在为棋盘游戏Hex编写AI。 I want to use Monte-Carlo-Tree-Search to do so and have already tried to implement it. 我想使用Monte-Carlo-Tree-Search这样做,并且已经尝试实现它。 However, the AI makes incredible stupid (random) moves and I can not figure out why it`s not working. 但是,AI做出了令人难以置信的愚蠢(随机)动作,我无法弄清楚为什么它不起作用。

import java.util.ArrayList;
import java.util.Random;

/**
 * Created by Robin on 18.03.2017.
 */
public class TreeNode {


    private static final Random random = new Random();
    private static final double epsion=10e-5;
    protected double nvisits;
    protected double totValue;
    protected int move=-1;

    private HexBoard board;
    protected ArrayList<TreeNode>children ;



    public TreeNode(HexBoard board){
        this.board =board;
    }


    //Copy-Constructor
    public TreeNode(TreeNode treeNode){
        this.nvisits=treeNode.nvisits;
        this.totValue=treeNode.totValue;
        this.move=treeNode.move;
        this.board = new HexBoard(treeNode.board);

    }

    public void update(double value){
        totValue+=value*board.color;
        nvisits++;
    }



    public void expand(){
        assert(children==null);
        children = new ArrayList<>(121-board.moveCount);
        for(int i=0;i<121;i++){
            if(board.board[i]!=HexBoard.EMPTY)
                continue;

                TreeNode newNode = new TreeNode(board);
                newNode.move =i;
                children.add(newNode);

        }
    }

    public void calculateIteration(){
        ArrayList<TreeNode>visited = new ArrayList<>();
        TreeNode current =this;
        visited.add(current);

        while(!current.isLeafNode()){
            current =current.select();
            board.makeMove(current.move);
            visited.add(current);
        }

        //Found a leaf node
        double value;
        if(current.board.getWinner()==0){
            current.expand();
            TreeNode newNode =current.select();
            value =playOut(newNode.board);
        }else{
            value =current.board.getWinner();
        }

        //update all the nodes

        for(int i=1;i<visited.size();i++){
            visited.get(i).update(value);
            board.undoMove(visited.get(i).move);
        }
        visited.get(0).update(value);
    }

    public static int playOut(HexBoard board){
        int winner=0;

        if(board.moveCount==121) {
            winner=board.getWinner();

            return winner;
        }

        //Checking-Movecount vs actual stones on the board


        final double left =121-board.moveCount;
        double probibility =1/left;
        double summe =0;
        double p =random.nextDouble();

        int randomMove =0;
        for(int i=0;i<121;i++){
            if(board.board[i]!=HexBoard.EMPTY)
                continue;

            summe+=probibility;

            if(p<=summe && probibility!=0) {
                randomMove = i;
                break;
            }
        }

        board.makeMove(randomMove);
        winner =playOut(board);
        board.undoMove(randomMove);

        return winner;
    }


    public TreeNode select(){

        TreeNode bestNode=null;
        double bestValue =-10000000;
        for(TreeNode node : children){

            double uctvalue =(node.nvisits==0)?100000:(node.totValue/(node.nvisits)+Math.sqrt((Math.log(this.nvisits))/(2*node.nvisits)));
            uctvalue+=epsion*random.nextDouble();

            if(uctvalue>bestValue){
                bestValue=uctvalue;
                bestNode =node;
            }
        }

        return bestNode;
        ///
    }

    public boolean isLeafNode(){
        return (children==null);
    }
}

Is my implementation inside the method calcualteIteration() correct ? 我在calcualteIteration()方法中的实现是否正确?

I know this might not be a very attractive problem to look at but I would appreciate any help 我知道这可能不是一个很吸引人的问题,但我将不胜感激

OP added extra information in comments after the question. OP在问题后的注释中添加了额外的信息。 The important part of that extra information is that the makeMove() method was implemented to check which player is to play next (to make sure updates to board are correct). 这些额外信息的重要部分是,实现了makeMove()方法来检查接下来要播放哪个玩家(以确保对木板的更新正确)。

Given that information, the implementation of select() in the OP is not correct, because it does not take into account which player is to move when computing the UCT score. 给定该信息,OP中的select()的实现是不正确的,因为在计算UCT得分时,它没有考虑要移动哪个玩家。 The UCT score consists of an "exploitation" part (the first fraction, computing average score over all previous simulations), and an "exploration" part (the part under square root, which increases for nodes that have been visited rarely relative to their parent). UCT分数由“开发”部分(第一部分,计算所有先前模拟的平均分数)和“探索”部分(平方根下的部分)组成,相对于父节点很少访问的节点,该分数增加)。 The exploitation part of this equation should be negated when the opponent is allowed to make a move next. 当允许对手下一步行动时,应否定此等式的利用部分。 If this is not done, the AI will essentially assume that the opponent is willing to actively help the AI, instead of assuming that the opponent will try to win for himself. 如果不这样做,则AI基本上会假设对手愿意主动帮助AI,而不是假设对手会为自己争取胜利。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM