简体   繁体   中英

How can I return the next move in the alpha-beta algorithm?

I'm implementing an AI to play Tic Tac Toe and I'm using the alpha-beta algorithm to search for the best move. Below is the code I have so far. I managed to make the algorithm work -- the value of the states seems to be correct, but I'm not being able to return the right next move/board.

When I execute the goal Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB). , this is the output:

?- Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB).

Board = [-, -, -, o, -, -, -, -, -],
V = 0,
NB = [-, -, -, o, -, -, -, -, x].

The value V is correct (it indicates that the result of the match will be a draw), but NB, which represents the next move of the 'x' player, is not.

Testing, I beat the AI, what shouldn't have happened. In the image I executed the goal several times simulating a Tic Tac Toe match. The AI plays with the 'x' symbol. About the output: the first board is the current board, the board provided as input and the second one is the NextBoard , the move made by the AI:

测试人工智能

I've tried a few things. I tried to use guitracer , I tried to read other implementations, but I can't find a solution for my implementation. Could anyone tell me what I'm doing wrong?

alpha_beta(max,Board,Value, NextBoard):-
    ab_minimax(max,Board,-inf,inf,Value, NextBoard).
    
ab_minimax(max,Board,_,_,-1, NextBoard):-
    is_winning_state_o(Board), !.
ab_minimax(min,Board,_,_,1, NextBoard):-
    is_winning_state_x(Board), !. 
ab_minimax(_,Board,_,_,0, NextBoard):-
    is_a_draw(Board), !.
ab_minimax(max,Board,Alfa,Beta,Value, NextBoard):-
    children(Board, max, Children),
    ab_max_children(Children,Alfa,Beta,-inf,Value, NB, NextBoard).
ab_minimax(min,Board,Alfa,Beta,Value, NextBoard):-
    children(Board, min, Children),
    ab_min_children(Children,Alfa,Beta,inf,Value, NB, NextBoard).

ab_max_children([],_,_,Max,Max, NextBoard, NextBoard).
ab_max_children([H|T],Alfa,Beta,Max1,Max, NB, NextBoard):-
    ab_minimax(min,H,Alfa,Beta,Value, NextBoardX),
    ( 
        Value > Beta -> % Beta cut
            Max = Beta,
            NextBoard = H
        ; (
            max(Value,Alfa,Alfa1), % updates Alpha
            max(Value,Max1,Max2),
            (Max2 == Value -> NB1 = H; NB1 = NB),
            ab_max_children(T, Alfa1, Beta, Max2, Max, NB1, NextBoard)
        )
    ).

ab_min_children([],_,_,Min,Min, NextBoard, NextBoard).
ab_min_children([H|T],Alfa,Beta,Min1,Min, NB, NextBoard):-
    ab_minimax(max,H,Alfa,Beta,Value, NextBoardX),
    (
        Alfa > Value -> % Alpha cut
            Min = Alfa,
            NextBoard = H
        ; (
            min(Value,Beta,Beta1), % updates Beta
            min(Value,Min1,Min2),
            (Min2 == Value -> NB1 = H; NB1 = NB),
            ab_min_children(T, Alfa, Beta1, Min2, Min, NB1, NextBoard)
        )
    ).

is_winning_state_x(S) :-
    winning_state_x(S), !.

winning_state_x(['x','x','x',_,_,_,_,_,_]). % [1,2,3]
winning_state_x([_,_,_,'x','x','x',_,_,_]). % [4,5,6]
winning_state_x([_,_,_,_,_,_,'x','x','x']). % [7,8,9]
winning_state_x(['x',_,_,'x',_,_,'x',_,_]). % [1,4,7]
winning_state_x([_,'x',_,_,'x',_,_,'x',_]). % [2,5,8]
winning_state_x([_,_,'x',_,_,'x',_,_,'x']). % [3,6,9]
winning_state_x(['x',_,_,_,'x',_,_,_,'x']). % [1,5,9]
winning_state_x([_,_,'x',_,'x',_,'x',_,_]). % [3,5,7]

is_winning_state_o(S) :-
    winning_state_o(S), !.

winning_state_o(['o','o','o',_,_,_,_,_,_]). % [1,2,3]
winning_state_o([_,_,_,'o','o','o',_,_,_]). % [4,5,6]
winning_state_o([_,_,_,_,_,_,'o','o','o']). % [7,8,9]
winning_state_o(['o',_,_,'o',_,_,'o',_,_]). % [1,4,7]
winning_state_o([_,'o',_,_,'o',_,_,'o',_]). % [2,5,8]
winning_state_o([_,_,'o',_,_,'o',_,_,'o']). % [3,6,9]
winning_state_o(['o',_,_,_,'o',_,_,_,'o']). % [1,5,9]
winning_state_o([_,_,'o',_,'o',_,'o',_,_]). % [3,5,7]

has_empty_position(['-'|_]) :- !.
has_empty_position([_|T]) :- has_empty_position(T).

is_a_draw(S) :-
    not(has_empty_position(S)).

children(Board, Player, Children) :-
    findall(NewBoard, make_move(Player, Board, NewBoard), Children).

make_move(max, ['-'|T], ['x'|T]).
make_move(min, ['-'|T], ['o'|T]).
make_move(Player, [H|T1], [H|T2]) :- make_move(Player, T1, T2).

The question is, what the "right" next move is, since there is often more than one optimal move.

What you are doing is to evaluate your board's children. The above table shows each of the children with their Value, the current Alpha and the current Beta:

  [[x,-,-,o,-,-,-,-,-],0,-inf,inf]
  [[-,x,-,o,-,-,-,-,-],0,0,inf]
  [[-,-,x,o,-,-,-,-,-],0,0,inf]
  [[-,-,-,o,x,-,-,-,-],0,0,inf]
  [[-,-,-,o,-,x,-,-,-],0,0,inf]
  [[-,-,-,o,-,-,x,-,-],0,0,inf]
  [[-,-,-,o,-,-,-,x,-],0,0,inf]
  [[-,-,-,o,-,-,-,-,x],0,0,inf]

Since all children have the same value, the next board is of course the last child.

There are two problems:

  1. You implement it in such a way that for each Board, you have at most one Nextboard, whereas there could be more than one optimal next board.

  2. I think that you evaluate the children incorrectly. When I am running good old minimax, I get:

[x,-,-,o,-,-,-,-,-]  0
[-,x,-,o,-,-,-,-,-] -1
[-,-,x,o,-,-,-,-,-] -1
[-,-,-,o,x,-,-,-,-] 0
[-,-,-,o,-,x,-,-,-] 0
[-,-,-,o,-,-,x,-,-] 0
[-,-,-,o,-,-,-,x,-] -1
[-,-,-,o,-,-,-,-,x] -1

Shouldn't Alpha-beta pruning only cut off considering certain alternatives and thus making the search faster, but yield the same results?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM