简体   繁体   English

用于Othello的带有Alpha Beta修剪的MiniMax无法正常工作

[英]MiniMax with Alpha Beta Pruning for Othello not working

I have the following implementation of a alpha beta minimax for an othello (reversi) game. 我有以下针对othello(reversi)游戏的alpha beta minimax的实现。 Somehow, this never really returns the proper action to take. 无论如何,这永远不会真正返回采取的正确行动。 It seems to return the default action I put in the function (0, 0) and the secondary value of -32768, which means it got pruned at the MAX subroutine. 似乎返回了我在函数(0,0)中输入的默认操作以及-32768的辅助值,这意味着它已在MAX子例程中被修剪。 Any tips on what I can improve with this and how I can fix this problem? 关于此方面我可以改善什么以及如何解决此问题的任何提示?

Note: I've identified the successors being returned properly for the most part. 注意:在大多数情况下,我已经确定后继者可以正确退还。 The max depth for now is 8. Computer player's pn (player number) is 1 and the human player's is 0. The first stage, 0, is MINIMAX_MAX. 现在的最大深度为8。计算机玩家的pn(玩家编号)为1,人类玩家的pn为0。第一阶段的0为MINIMAX_MAX。 Alpha and beta are initially set to INT_MIN and INT_MAX respectively. Alpha和Beta最初分别设置为INT_MIN和INT_MAX。

mm_out minimax(Grid& G, int alpha, int beta, Action& A, uint pn, uint depth, bool stage) {
    if (G.check_terminal_state() || depth == MAX_DEPTH) {
#ifdef DEBUG
        cout << "best action: (" << A.get_x() << ", " << A.get_y() << ")\n";
#endif
        return mm_out(A, G.get_utility(pn));
    }

    // add end game score total here

#ifdef DEBUG
    if (stage == MINIMAX_MAX) {
        cout << "max " << alpha << " " << beta << "\n";
    }
    else {
        cout << "min " << alpha << " " << beta << "\n";
    }
#endif

    set<Action> succ_temp = G.get_successors(pn);
    for (Action a : succ_temp) {

#ifdef DEBUG
        cout << a.get_x() << " " << a.get_y() << '\n';
#endif

        Grid gt(G);
        a.evaluate(gt);
    }
    set<Action, action_greater> successors(succ_temp.begin(), succ_temp.end());

#ifdef DEBUG
    Player p(0, "minimaxtest");
    G.display(p);
    int test;
    cin >> test;
#endif

    // if no successor, that player passes
    if (successors.size()) {
        for (auto a = successors.begin(); a != successors.end(); ++a) {
            Grid gt(G);
            gt.do_move(pn, a->get_x(), a->get_y(), !PRINT_ERR);
            Action at = *a;
            mm_out mt = minimax(gt, alpha, beta, at, pn ^ 1, depth + 1, !stage);
            int temp = mt.val;
//          A = mt.best_move;

            if (stage == MINIMAX_MAX) {
                if (alpha < temp) {
                    alpha = temp;
                    A = *a;
#ifdef DEBUG
                    cout << "Current action: (" << A.get_x() << ", " << A.get_y() << ") alpha = " << alpha << "\n";
#endif
                }
                if (alpha >= beta) {
#ifdef DEBUG
                    cout << "pruned at max\n";
#endif
                    return mm_out(A, beta);
                }
            }
            else {
                if (beta > temp) {
                    beta = temp;
                    A = *a;
#ifdef DEBUG
                    cout << "Current action: (" << A.get_x() << ", " << A.get_y() << ") beta = " << beta << "\n";
#endif
                }
                if (alpha >= beta) {
#ifdef DEBUG
                    cout << "pruned at min\n";
#endif
                    return mm_out(A, alpha);
                }


}
    }
    return mm_out(A, (stage == MINIMAX_MAX) ? alpha : beta);
}
else {
    cout << "no successor\n";
    return mm_out(A, (stage == MINIMAX_MAX) ? (std::numeric_limits<int>::max() - 1) : (std::numeric_limits<int>::min() + 1));
}

} }

Utility function: 实用功能:

int Grid::get_utility(uint pnum) const {
    if (pnum)
        return wcount - bcount;
    return bcount - wcount;
}

You should pass the alpha / beta parameters by value (not by reference): 您应该按值(而不是按引用)传递alpha / beta参数:

mm_out minimax(Grid& G, int alpha, int beta, Action& A, uint pn, uint depth, bool stage)

Each node passes the alpha and beta values to its children. 每个节点将alpha和beta值传递给其子级。 The children then update their own copies of the alpha or beta value depending on whose turn it is and return the final evaluation of that node. 然后孩子根据轮到谁来更新自己的alpha或beta值副本 ,并返回该节点的最终评估值。 That is then used to update the alpha or beta value of the parent. 然后将其用于更新父级的alpha或beta值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM