简体   繁体   中英

cs50 python tictactoe minimax algorithm

I am currently doing this problem in cs50 AI where we need to make a minimax algorithm for playing tictactoe. My algorithm doesn't work at all (it is really easy to beat the computer) and I was wondering what I was doing wrong. I am also pretty sure that all my other functions are correct and that only the minimax function is incorrect. Would really appreciate any help, thank you all!



import math, copy

X = "X"
O = "O"
EMPTY = None


def initial_state():
    """
    Returns starting state of the board.
    """
    return [[EMPTY, EMPTY, EMPTY],
            [EMPTY, EMPTY, EMPTY],
            [EMPTY, EMPTY, EMPTY]]


def player(board):
    """
    Returns player who has the next turn on a board.
    """
    xplayer = 0
    yplayer = 0
    for row in board:
        for column in row:
            if column == 'X':
                xplayer += 1
            elif column == 'O':
                yplayer += 1

    if xplayer == yplayer:
        return X
    else:
        return O


def actions(board):
    """
    Returns set of all possible actions (i, j) available on the board.
    """
    ans = set()
    rownum = 0
    colnum = 0
    for row in board:
        colnum = 0
        for column in row:
            if not column:
                ans.add((rownum, colnum))
            colnum += 1
        rownum += 1

    return ans

def result(board, action):
    """
    Returns the board that results from making move (i, j) on the board.
    """
    if board[action[0]][action[1]] != None :
        raise BoardError("Tried to place on full square")
    move = player(board)
    newboard = copy.deepcopy(board)
    newboard[action[0]][action[1]] = move
    return newboard


def winner(board):
    """
    Returns the winner of the game, if there is one.
    """


    for i in range(3):
        sum = 0
        for j in range(3):
            if board[i][j] == 'X':
                sum += 1
            elif board[i][j] == 'O':
                sum -= 1
        if sum == 3:
            return X
        elif sum == -3:
            return O

    for j in range(3):
        sum = 0
        for i in range(3):
            if board[i][j] == 'X':
                sum += 1
            elif board[i][j] == 'O':
                sum -= 1
        if sum == 3:
            return X
        elif sum == -3:
            return O
    if board[0][0] == board[1][1] == board[2][2]:
        return board[0][0]
    if board[2][0] == board[1][1] == board[0][2]:
        return board[2][0]
    return None


def terminal(board):
    """
    Returns True if game is over, False otherwise.
    """
    if winner(board):
        return True
    if not actions(board):
        return True
    return False



def utility(board):
    """
    Returns 1 if X has won the game, -1 if O has won, 0 otherwise.
    """
    if winner(board) == X:
        return 1
    elif winner(board) == O:
        return -1
    else:
        return 0

def minimax(board):
    """
    Returns the optimal action for the current player on the board.
    """
    if player(board) == X:
        aim = 1
    elif player(board) == O:
        aim = -1
    if terminal(board):
        return None


    possiblemoves = actions(board)
    for move in possiblemoves:
        newboard = result(board,move)

        #if move leads to the aimed score, return move
        if utility(newboard) == aim:
            return move

        #if nodes down the chain return a value cos they have reached the aim, return this current move
        if minimax(newboard):
            return move


    aim = 0

    #change the aim to be a draw since winning is no longer possible
    for move in possiblemoves:
        newboard = result(board,move)


        if utility(newboard) == aim:
            return move

        if minimax(newboard):
            return move

    #all the moves will result in a loss, so i just return the first move
    return possiblemoves[0]

Basically X aims to maximise and O aims to minimise. What I have done for the algorithm is depending on the player, first look for moves that will result in either 1 or -1 depending on the player. Then if that doesn't happen, look for moves that result in 0(a tie). Then afterwards just simply return any move since that means the player will lose.

You seem to have lots of unnecessary functions and your minimax code looks way too complicated than it needs to be. Basically you need 4 main functions for your game:

  • Minimax itself
  • Get all possible moves from a position (unless you want to do the loop inside minimax)
  • Determine if a player has won
  • Determine if board is full

Also, have you looked at the pseudo code for minimax from eg Wikipedia? :

function minimax(node, depth, maximizingPlayer) is
    if depth = 0 or node is a terminal node then
        return the heuristic value of node
    if maximizingPlayer then
        value := −∞
        for each child of node do
            value := max(value, minimax(child, depth − 1, FALSE))
        return value
    else (* minimizing player *)
        value := +∞
        for each child of node do
            value := min(value, minimax(child, depth − 1, TRUE))
        return value

Here is the general idea:

  1. Check if the current board is a win, draw or loss and return value based on the information, typically return -1, 0, or 1. This is the first if statement from the pseudo code.
  2. Then it looks at if it is its turn or the opponent from the if-else statement.
  3. Inside that you set the wort possible score as starting value. If you only return -1, 0 or 1 then it is enough to set -2 and 2 respectively as max/min value.
  4. You now run through all possible moves in that position.
  5. You set the new value to the max/min of the returned value from another minimax call where the player turn has changed.
  6. After this recursive loop you return the value which will give the best possible route from the given board.

Depth is not necessary since tic-tac-toe is a simple game where we can always reach an end state. You can use depth to ensure that the computer will pick the shortest way possible to victory by adding depth to the heuristics return value. But I suggest you first get it to work without complicating things:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM