简体   繁体   中英

Minimax algorithm bug

I've been trying to learn the minimax algorithm and I've stumbled upon a bug which I cannot figure out how to solve. Code:

    private List<Integer> generatemoves(int[] evalFields) {
    List<Integer> nextMoves = new ArrayList<Integer>();
    for (int i = 0; i < evalFields.length; i++) {
        if (evalFields[i] == 0) {
            nextMoves.add(i);
        }
    }
    return nextMoves;
}

private int evaluateLine(int p1, int p2, int p3, int[] evalFields) {
    int score = 0;
    if (evalFields[p1] == 1) {
        score = 1;
    } else if (evalFields[p1] == 10) {
        score = -1;
    }

    if (evalFields[p2] == 1) {
        if (score == 1) {
            score = 10;
        } else if (score == -1) {
            return 0;
        } else {
            score = 1;
        }
    } else if (evalFields[p2] == 10) {
        if (score == -1) {
            score = -10;
        } else if (score == 1) {
            return 0;
        } else {
            score = -1;
        }
    }

    if (evalFields[p3] == 1) {
        if (score > 0) {
            score *= 10;
        } else if (score < 0) {
            return 0;
        } else {
            score = 1;
        }
    } else if (evalFields[p3] == 10) {
        if (score < 0) {
            score *= 10;
        } else if (score > 1) {
            return 0;
        } else {
            score = -1;
        }
    }
    return score;
}

private int evaluateBoard(int [] evalFields) {
    int score = 0;
    score += evaluateLine(0, 1, 2, evalFields);
    score += evaluateLine(3, 4, 5, evalFields);
    score += evaluateLine(6, 7, 8, evalFields);
    score += evaluateLine(0, 3, 6, evalFields);
    score += evaluateLine(1, 4, 7, evalFields);
    score += evaluateLine(2, 5, 8, evalFields);
    score += evaluateLine(0, 4, 8, evalFields);
    score += evaluateLine(2, 4, 6, evalFields);

    return score;
}

private int bestMove(int currentTurn, int[] board) {
    int move;
    int bestScore;
    if (currentTurn == 1) {
        bestScore = Integer.MIN_VALUE;
    } else {
        bestScore = Integer.MAX_VALUE;
    }
    List<Integer> nextMoves = generatemoves(board);
    List<Integer> bestScores = new ArrayList<Integer>();
    for (int i = 0; i < nextMoves.size(); i++) {
        int[] newBoards = new int[9];
        for (int j = 0; j < board.length; j++) {
            newBoards[j] = board[j];
        }
        newBoards[nextMoves.get(i)] = turn;
        bestScores.add(evaluateBoard(newBoards));
    }


    for (int scores : bestScores) {
        if (currentTurn == 1) {
            if (scores > bestScore) bestScore = scores;
        } else {
            if (scores < bestScore) bestScore = scores;
        }
    }
    move = nextMoves.get(bestScores.indexOf(bestScore));

    return move;
}

This is the most relevant part of the code. What it does or what I think it does is that it generates every possible move from the board which is called fields. Then it calculates a score for each move. It then proceeds to make the move which results in the highest or lowest score, x(1) is trying to get the highest and O(10) the lowest. The bug that occurs is that when the player starts and takes the field in the middle, then the ai acts normally but after the players second turn the ai starts to act strange:

[ ][ ][ ]    [O][ ][ ]    [O][ ][O]
[ ][x][ ] => [ ][x][ ] => [x][x][ ]
[ ][ ][ ]    [ ][ ][ ]    [ ][ ][ ]

If the player chooses this:

[O][ ][ ]    [O][ ][ ]
[ ][x][x] => [O][x][x]
[ ][ ][ ]    [ ][ ][ ]

Then the ai acts nomally. I don't know what is wrong or even if I've understood the minimax algorithm correctly.

****edit**** Added this code still have the same problem

    private int[] evaluateMove(int [] board, int currentTurn) {
    int bestScore;
    int currentScore;
    int bestMove = -1;
    if (currentTurn == 1) {
        bestScore = Integer.MIN_VALUE;
    } else {
        bestScore = Integer.MAX_VALUE;
    }

    List<Integer> nextMoves = generatemoves(board);
    if (nextMoves.isEmpty()) {
        bestScore = evaluateTheBoard(board);
    } else {
        for (int move : nextMoves) {
            int[] nextBoard = new int[9];
            for (int i = 0; i < nextBoard.length; i ++) {
                nextBoard[i] = board[i];
            }
            nextBoard[move] = currentTurn;
            currentScore = evaluateMove(nextBoard, nextTurn())[0];
            if (currentTurn == 1) {
                if (currentScore > bestScore) {
                    bestScore = currentScore;
                    bestMove = move;
                }
            } else {
                if (currentScore < bestScore) {
                    bestScore = currentScore;
                    bestMove = move;
                }
            }
        }
    }
    return new int[] {bestScore, bestMove};
}

I think you are misunderstanding how to look ahead in a game like this. Do not 'total' the values returned by evaluateLine .

Here is pseudocode for the minimax score of a tic-tac-toe board (what evaluateBoard should return). Note that evaluateBoard will need to have a notion of currentTurn .

function evaluateBoard(board, currentTurn)

// check if the game has already ended:
if WhiteHasWon then return -10
if BlackHasWon then return +10

// WhiteHasWon returns true if there exists one or more winning 3-in-a-row line for white. 
// (You will have to scan for all 8 possible 3-in-a-row lines of white pieces)
// BlackHasWon returns true if there exists one or more winning 3-in-a-row line for black

if no legal moves, return 0 // draw

// The game isn't over yet, so look ahead:
bestMove = notset
resultScore = notset
for each legal move i for currentTurn,
   nextBoard = board
   Apply move i to nextBoard
   score = evaluateBoard(nextBoard, NOT currentTurn).score
   if score is <better for currentTurn> than resultScore, then   
      resultScore = score
      bestMove = move i
return (resultScore, bestMove)

One very key difference between this and your version and my version is that my version is recursive . Yours only goes one level deep. Mine calls evaluateBoard from inside evaluateBoard , which would be an infinite loop if we aren't careful (once the board fills up, it can't go any deeper, so it's not actually infinite)

Another difference is that yours totals stuff when it shouldn't. The resulting score from tic-tac-toe is -10,0, or 10 only once you've looked to the end of the game. You should be picking the best possible move available to that player at that time, and ignoring all other possibilities completely because you only care about the "best" line of play. The game score is equal to the result of optimal play.

Expanding <better for currentTurn> is messy in minimax, which is why negamax is cleaner. White prefers low scores and black prefers high scores, so you need some if statements to make it choose the appropriate preferred score. You have this part already (at the end of your best move code), but it needs to be evaluated inside the recursion instead of just at the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM