I'm implementing an AI to play Tic Tac Toe and I'm using the alpha-beta algorithm to search for the best move. Below is the code I have so far. I managed to make the algorithm work -- the value of the states seems to be correct, but I'm not being able to return the right next move/board.
When I execute the goal Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB).
, this is the output:
?- Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB).
Board = [-, -, -, o, -, -, -, -, -],
V = 0,
NB = [-, -, -, o, -, -, -, -, x].
The value V
is correct (it indicates that the result of the match will be a draw), but NB, which represents the next move of the 'x' player, is not.
Testing, I beat the AI, what shouldn't have happened. In the image I executed the goal several times simulating a Tic Tac Toe match. The AI plays with the 'x' symbol. About the output: the first board is the current board, the board provided as input and the second one is the NextBoard
, the move made by the AI:
I've tried a few things. I tried to use guitracer
, I tried to read other implementations, but I can't find a solution for my implementation. Could anyone tell me what I'm doing wrong?
alpha_beta(max,Board,Value, NextBoard):-
ab_minimax(max,Board,-inf,inf,Value, NextBoard).
ab_minimax(max,Board,_,_,-1, NextBoard):-
is_winning_state_o(Board), !.
ab_minimax(min,Board,_,_,1, NextBoard):-
is_winning_state_x(Board), !.
ab_minimax(_,Board,_,_,0, NextBoard):-
is_a_draw(Board), !.
ab_minimax(max,Board,Alfa,Beta,Value, NextBoard):-
children(Board, max, Children),
ab_max_children(Children,Alfa,Beta,-inf,Value, NB, NextBoard).
ab_minimax(min,Board,Alfa,Beta,Value, NextBoard):-
children(Board, min, Children),
ab_min_children(Children,Alfa,Beta,inf,Value, NB, NextBoard).
ab_max_children([],_,_,Max,Max, NextBoard, NextBoard).
ab_max_children([H|T],Alfa,Beta,Max1,Max, NB, NextBoard):-
ab_minimax(min,H,Alfa,Beta,Value, NextBoardX),
(
Value > Beta -> % Beta cut
Max = Beta,
NextBoard = H
; (
max(Value,Alfa,Alfa1), % updates Alpha
max(Value,Max1,Max2),
(Max2 == Value -> NB1 = H; NB1 = NB),
ab_max_children(T, Alfa1, Beta, Max2, Max, NB1, NextBoard)
)
).
ab_min_children([],_,_,Min,Min, NextBoard, NextBoard).
ab_min_children([H|T],Alfa,Beta,Min1,Min, NB, NextBoard):-
ab_minimax(max,H,Alfa,Beta,Value, NextBoardX),
(
Alfa > Value -> % Alpha cut
Min = Alfa,
NextBoard = H
; (
min(Value,Beta,Beta1), % updates Beta
min(Value,Min1,Min2),
(Min2 == Value -> NB1 = H; NB1 = NB),
ab_min_children(T, Alfa, Beta1, Min2, Min, NB1, NextBoard)
)
).
is_winning_state_x(S) :-
winning_state_x(S), !.
winning_state_x(['x','x','x',_,_,_,_,_,_]). % [1,2,3]
winning_state_x([_,_,_,'x','x','x',_,_,_]). % [4,5,6]
winning_state_x([_,_,_,_,_,_,'x','x','x']). % [7,8,9]
winning_state_x(['x',_,_,'x',_,_,'x',_,_]). % [1,4,7]
winning_state_x([_,'x',_,_,'x',_,_,'x',_]). % [2,5,8]
winning_state_x([_,_,'x',_,_,'x',_,_,'x']). % [3,6,9]
winning_state_x(['x',_,_,_,'x',_,_,_,'x']). % [1,5,9]
winning_state_x([_,_,'x',_,'x',_,'x',_,_]). % [3,5,7]
is_winning_state_o(S) :-
winning_state_o(S), !.
winning_state_o(['o','o','o',_,_,_,_,_,_]). % [1,2,3]
winning_state_o([_,_,_,'o','o','o',_,_,_]). % [4,5,6]
winning_state_o([_,_,_,_,_,_,'o','o','o']). % [7,8,9]
winning_state_o(['o',_,_,'o',_,_,'o',_,_]). % [1,4,7]
winning_state_o([_,'o',_,_,'o',_,_,'o',_]). % [2,5,8]
winning_state_o([_,_,'o',_,_,'o',_,_,'o']). % [3,6,9]
winning_state_o(['o',_,_,_,'o',_,_,_,'o']). % [1,5,9]
winning_state_o([_,_,'o',_,'o',_,'o',_,_]). % [3,5,7]
has_empty_position(['-'|_]) :- !.
has_empty_position([_|T]) :- has_empty_position(T).
is_a_draw(S) :-
not(has_empty_position(S)).
children(Board, Player, Children) :-
findall(NewBoard, make_move(Player, Board, NewBoard), Children).
make_move(max, ['-'|T], ['x'|T]).
make_move(min, ['-'|T], ['o'|T]).
make_move(Player, [H|T1], [H|T2]) :- make_move(Player, T1, T2).
The question is, what the "right" next move is, since there is often more than one optimal move.
What you are doing is to evaluate your board's children. The above table shows each of the children with their Value, the current Alpha and the current Beta:
[[x,-,-,o,-,-,-,-,-],0,-inf,inf]
[[-,x,-,o,-,-,-,-,-],0,0,inf]
[[-,-,x,o,-,-,-,-,-],0,0,inf]
[[-,-,-,o,x,-,-,-,-],0,0,inf]
[[-,-,-,o,-,x,-,-,-],0,0,inf]
[[-,-,-,o,-,-,x,-,-],0,0,inf]
[[-,-,-,o,-,-,-,x,-],0,0,inf]
[[-,-,-,o,-,-,-,-,x],0,0,inf]
Since all children have the same value, the next board is of course the last child.
There are two problems:
You implement it in such a way that for each Board, you have at most one Nextboard, whereas there could be more than one optimal next board.
I think that you evaluate the children incorrectly. When I am running good old minimax, I get:
[x,-,-,o,-,-,-,-,-] 0
[-,x,-,o,-,-,-,-,-] -1
[-,-,x,o,-,-,-,-,-] -1
[-,-,-,o,x,-,-,-,-] 0
[-,-,-,o,-,x,-,-,-] 0
[-,-,-,o,-,-,x,-,-] 0
[-,-,-,o,-,-,-,x,-] -1
[-,-,-,o,-,-,-,-,x] -1
Shouldn't Alpha-beta pruning only cut off considering certain alternatives and thus making the search faster, but yield the same results?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.