I'm trying to make an AI that plays a game that looks a lot like checkers, the logic is pretty much the same. Anyway I'm looking to use Monte Carlo Tree Search method but I have no idea of how to implement the tree structure. If I'm not wrong the root of my tree should be the initial state or board and the nodes should be all the possible plays. I know I have to create a function to calculate the weight of each node and select the best possible play. My problem is, as I said before, that I have no clue as to how I can implement said tree in python.
So far I have my board and two functions that return a list of the legal moves you can make. The board was created with a 10x10 multidimensional array and to find the possible moves I have two functions that receive the X and Y coordinate of the piece I want to move and check all the available options. The reason why I have 2 move functions is because one functions serves for basic movements ie when the space right next to you is adjacent, while the other function checks for "hops", ie when the space right next to you is occupied but the space right next to it is free.
I'll add my code here just in case it makes it easier for you guys to understand what I'm trying to do.
import numpy as np
matrix = [[1,1,1,1,1,0,0,0,0,0], [1,1,1,1,0,0,0,0,0,0], [1,1,1,0,0,0,0,0,0,0], [1,1,0,0,0,0,0,0,0,0], [1,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,2], [0,0,0,0,0,0,0,0,2,2], [0,0,0,0,0,0,0,2,2,2], [0,0,0,0,0,0,2,2,2,2], [0,0,0,0,0,2,2,2,2,2]]
#new_matrix = np.fliplr(np.flipud(matrix))
#new_matrix = new_matrix.tolist()
print "\n".join(" ".join(str(el) for el in row) for row in matrix)
#print "\n"
#print "\n".join(" ".join(str(el) for el in row) for row in new_matrix)
def basicMove(x,y):
listMoves = []
if x > 0 and matrix[x-1][y] == 0: #left
listMoves.append([x-1,y])
if x < 9 and matrix[x+1][y] == 0: #right
listMoves.append([x+1,y])
if y < 9: #up
if matrix[x][y+1] == 0:
listMoves.append([x,y+1])
if x>0 and matrix[x-1][y+1] == 0: #up left
listMoves.append([x-1,y+1])
if x < 9 and matrix[x+1][y+1] == 0: #up right
listMoves.append([x+1,y+1])
if y > 0: #down
if matrix[x][y-1] == 0:
listMoves.append([x,y-1])
if x > 0 and matrix[x-1][y-1] == 0: #down left
listMoves.append([x-1,y-1])
if x<9 and matrix[x+1][y-1] == 0: #down right
listMoves.append([x+1,y-1])
return listMoves
def hopper(x,y):
listHops = []
listHops.append(basicMove(x,y)) #Call the basic move function inside the hop function
if x > 1 and matrix[x-1][y] != 0 and matrix[x-2][y] == 0: #left
listHops.append([x-2,y])
if x < 8 and matrix[x+1][y] != 0 and matrix[x+2][y] == 0: #right
listHops.append([x+2,y])
if y > 1:
if matrix[x][y-1] != 0 and matrix[x][y-2] == 0: #down
listHops.append([x,y-2])
if x>1 and matrix[x-1][y-1] != 0 and matrix[x-2][y-2] == 0: #down left
listHops.append([x-2,y-2])
if x < 8 and matrix[x+1][y+1] != 0 and matrix[x+2][y-2] == 0: #down right
listHops.append([x+2,y-2])
if y < 8:
if matrix[x][y+1] != 0 and matrix[x][y+2] == 0: #up
listHops.append([x,y+2])
if x > 1 and matrix[x-1][y+1] != 0 and matrix[x-2][y+2] == 0: #up left
listHops.append([x-2,y+2])
if x < 8 and matrix[x+1][y+1] != 0 and matrix[x+2][y+2] == 0: #up right
listHops.append([x+2,y+2])
return listHops
hopper(2,1) #Testing the function
One last question, will using Object Oriented Programming make things much more easier/efficient for me? I've been checking some examples of people that implement MCTS for games such as Tic tac toe and Reversi on Python and they all seem to use OOP. Thanks for you help.
Firstly, Yes. The root of the tree will be the initial state of the board.
But, you don't need a function to calculate the weight (or evaluation function), in Monte Carlo Tree Search. Here, similar task is done through function called, "simulate", which randomly plays the game from given state(or node), till the end (until reached an outcome ie, win/draw/loose), and returns that outcome(+1/0/-1). MCTS algorithm uses simulate multiple times, to get a rough estimation about how good or bad, the move in consideration is. Then, it explores deeper (or expands) the best looking move, by running more simulation over that move, to get clearer estimation. Concretely, move, with highest value in cumulative results of random plays through that move is selected to expand further. Algorithm also keeps track of depth of exploration (so that it doesn't just keep on digging one move and leaves a better move), such that it explores nodes with least regret ( O(logn), which is optimal ).
And, about using Object Oriented Programming, It can help, if you are good at it, but it can be done without it also (try using nodes as a list, and use sublist of it to store features you want to store in each node)
Resources:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.