如何在 alpha-beta 算法中返回下一步？

Question

I'm implementing an AI to play Tic Tac Toe and I'm using the alpha-beta algorithm to search for the best move.我正在实现一个人工智能来玩井字游戏，并且我正在使用 alpha-beta 算法来搜索最佳动作。 Below is the code I have so far.以下是我到目前为止的代码。 I managed to make the algorithm work -- the value of the states seems to be correct, but I'm not being able to return the right next move/board.我设法使算法工作——状态的值似乎是正确的，但我无法返回正确的下一步移动/棋盘。

When I execute the goal Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB).当我执行目标Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB). , this is the output: ，这是 output：

?- Board = ['-','-','-','o','-','-','-','-','-'], alpha_beta(max, Board, V, NB).

Board = [-, -, -, o, -, -, -, -, -],
V = 0,
NB = [-, -, -, o, -, -, -, -, x].

The value V is correct (it indicates that the result of the match will be a draw), but NB, which represents the next move of the 'x' player, is not. V值是正确的（表示比赛结果是平局），但代表“x”玩家下一步行动的 NB 不正确。

Testing, I beat the AI, what shouldn't have happened.测试，我打败了AI，不应该发生的事情。 In the image I executed the goal several times simulating a Tic Tac Toe match.在图像中，我多次执行目标，模拟井字游戏。 The AI plays with the 'x' symbol. AI 使用“x”符号。 About the output: the first board is the current board, the board provided as input and the second one is the NextBoard , the move made by the AI:关于 output：第一个板是当前板，作为输入提供的板，第二个是NextBoard ，AI 的动作：

I've tried a few things.我已经尝试了几件事。 I tried to use guitracer , I tried to read other implementations, but I can't find a solution for my implementation.我尝试使用guitracer ，我尝试阅读其他实现，但我找不到我的实现的解决方案。 Could anyone tell me what I'm doing wrong?谁能告诉我我做错了什么？

alpha_beta(max,Board,Value, NextBoard):-
    ab_minimax(max,Board,-inf,inf,Value, NextBoard).
    
ab_minimax(max,Board,_,_,-1, NextBoard):-
    is_winning_state_o(Board), !.
ab_minimax(min,Board,_,_,1, NextBoard):-
    is_winning_state_x(Board), !. 
ab_minimax(_,Board,_,_,0, NextBoard):-
    is_a_draw(Board), !.
ab_minimax(max,Board,Alfa,Beta,Value, NextBoard):-
    children(Board, max, Children),
    ab_max_children(Children,Alfa,Beta,-inf,Value, NB, NextBoard).
ab_minimax(min,Board,Alfa,Beta,Value, NextBoard):-
    children(Board, min, Children),
    ab_min_children(Children,Alfa,Beta,inf,Value, NB, NextBoard).

ab_max_children([],_,_,Max,Max, NextBoard, NextBoard).
ab_max_children([H|T],Alfa,Beta,Max1,Max, NB, NextBoard):-
    ab_minimax(min,H,Alfa,Beta,Value, NextBoardX),
    ( 
        Value > Beta -> % Beta cut
            Max = Beta,
            NextBoard = H
        ; (
            max(Value,Alfa,Alfa1), % updates Alpha
            max(Value,Max1,Max2),
            (Max2 == Value -> NB1 = H; NB1 = NB),
            ab_max_children(T, Alfa1, Beta, Max2, Max, NB1, NextBoard)
        )
    ).

ab_min_children([],_,_,Min,Min, NextBoard, NextBoard).
ab_min_children([H|T],Alfa,Beta,Min1,Min, NB, NextBoard):-
    ab_minimax(max,H,Alfa,Beta,Value, NextBoardX),
    (
        Alfa > Value -> % Alpha cut
            Min = Alfa,
            NextBoard = H
        ; (
            min(Value,Beta,Beta1), % updates Beta
            min(Value,Min1,Min2),
            (Min2 == Value -> NB1 = H; NB1 = NB),
            ab_min_children(T, Alfa, Beta1, Min2, Min, NB1, NextBoard)
        )
    ).

is_winning_state_x(S) :-
    winning_state_x(S), !.

winning_state_x(['x','x','x',_,_,_,_,_,_]). % [1,2,3]
winning_state_x([_,_,_,'x','x','x',_,_,_]). % [4,5,6]
winning_state_x([_,_,_,_,_,_,'x','x','x']). % [7,8,9]
winning_state_x(['x',_,_,'x',_,_,'x',_,_]). % [1,4,7]
winning_state_x([_,'x',_,_,'x',_,_,'x',_]). % [2,5,8]
winning_state_x([_,_,'x',_,_,'x',_,_,'x']). % [3,6,9]
winning_state_x(['x',_,_,_,'x',_,_,_,'x']). % [1,5,9]
winning_state_x([_,_,'x',_,'x',_,'x',_,_]). % [3,5,7]

is_winning_state_o(S) :-
    winning_state_o(S), !.

winning_state_o(['o','o','o',_,_,_,_,_,_]). % [1,2,3]
winning_state_o([_,_,_,'o','o','o',_,_,_]). % [4,5,6]
winning_state_o([_,_,_,_,_,_,'o','o','o']). % [7,8,9]
winning_state_o(['o',_,_,'o',_,_,'o',_,_]). % [1,4,7]
winning_state_o([_,'o',_,_,'o',_,_,'o',_]). % [2,5,8]
winning_state_o([_,_,'o',_,_,'o',_,_,'o']). % [3,6,9]
winning_state_o(['o',_,_,_,'o',_,_,_,'o']). % [1,5,9]
winning_state_o([_,_,'o',_,'o',_,'o',_,_]). % [3,5,7]

has_empty_position(['-'|_]) :- !.
has_empty_position([_|T]) :- has_empty_position(T).

is_a_draw(S) :-
    not(has_empty_position(S)).

children(Board, Player, Children) :-
    findall(NewBoard, make_move(Player, Board, NewBoard), Children).

make_move(max, ['-'|T], ['x'|T]).
make_move(min, ['-'|T], ['o'|T]).
make_move(Player, [H|T1], [H|T2]) :- make_move(Player, T1, T2).

Answer 1

The question is, what the "right" next move is, since there is often more than one optimal move.问题是，“正确”的下一步行动是什么，因为通常有不止一个最佳行动。

What you are doing is to evaluate your board's children.你正在做的是评估你董事会的孩子。 The above table shows each of the children with their Value, the current Alpha and the current Beta:上表显示了每个孩子的价值、当前的 Alpha 和当前的 Beta：

  [[x,-,-,o,-,-,-,-,-],0,-inf,inf]
  [[-,x,-,o,-,-,-,-,-],0,0,inf]
  [[-,-,x,o,-,-,-,-,-],0,0,inf]
  [[-,-,-,o,x,-,-,-,-],0,0,inf]
  [[-,-,-,o,-,x,-,-,-],0,0,inf]
  [[-,-,-,o,-,-,x,-,-],0,0,inf]
  [[-,-,-,o,-,-,-,x,-],0,0,inf]
  [[-,-,-,o,-,-,-,-,x],0,0,inf]

Since all children have the same value, the next board is of course the last child.由于所有孩子都有相同的价值，下一个板当然是最后一个孩子。

There are two problems:有两个问题：

You implement it in such a way that for each Board, you have at most one Nextboard, whereas there could be more than one optimal next board.您以这样一种方式实现它，即对于每个 Board，您最多有一个 Nextboard，而可能有不止一个最佳下一个 Board。
I think that you evaluate the children incorrectly.我认为您对孩子的评价不正确。 When I am running good old minimax, I get:当我运行良好的旧 minimax 时，我得到：

[x,-,-,o,-,-,-,-,-]  0
[-,x,-,o,-,-,-,-,-] -1
[-,-,x,o,-,-,-,-,-] -1
[-,-,-,o,x,-,-,-,-] 0
[-,-,-,o,-,x,-,-,-] 0
[-,-,-,o,-,-,x,-,-] 0
[-,-,-,o,-,-,-,x,-] -1
[-,-,-,o,-,-,-,-,x] -1

Shouldn't Alpha-beta pruning only cut off considering certain alternatives and thus making the search faster, but yield the same results? Alpha-beta 剪枝不应该只考虑某些替代方案，从而使搜索更快，但产生相同的结果吗？

如何在 alpha-beta 算法中返回下一步？

问题描述

1 个解决方案

解决方案1
0 2021-03-19 14:47:36

如何在 alpha-beta 算法中返回下一步？

问题描述

1 个解决方案

解决方案1 0 2021-03-19 14:47:36

解决方案1
0 2021-03-19 14:47:36