简体   繁体   中英

Output of Artificial Neural Network in Othello

I'm implementing Othello using Artificial neural network. When I read document ( here , page 19), I don't understand some points. They calculate the output: image I dont know if they calculate that, how this my AI know what the legal moves in game to choose the best legal move. That ouput is only a float number (I think so) and how I can use it?

The good news

It's super simple: the Neural-Network is a Value-Network (instead of a Policy-Network). This Value-Network takes a board-state as input and calculates some score describing how good this position is. It's the basic building-block of all MinMax-based Game-AIs, often called Evaluation-function. (A Policy-Network output would give a probability-distribution over all possible moves).

So the NN gives you this score. You can then combine this score with some algorithm of choice. MinMax (nearly all Chess-AIs), MCTS (AlphaGo) are the most common.

Basic idea of MinMax: play a move, oponnent plays move, ..., ..., evaluate with your NN -> do this for all possible combinations and propagate with MinMax rule. Only a few ply's (half-moves) will be possible with this NN. But it will be very powerful for Othello and is easy to implement.

Basic idea of MCTS: play random move, play random move, ... until winner -> build-winner statistic. Now compare the average scores of all possible "first" moves. Pick best. Harder to incorporate NN as heuristic.

The calculatation you mentioned is just the classic rule in Neural NEtworks to define the activation together with a dense-layer.

The bad news

I didn't read the paper, but the hard thing is to train and prepare your NN. You need to provide some data. Maybe it will be supervised (if you have historical games; easier), maybe unsupervised (Q-learning and co.). This will be very hard to do without experience.

I do think i know all the theory needed, but i still failed to do this with some other (stochastic) games, because there are many many issues with autocorrelation and co. There is also a lot of hyperparameter-tuning needed.

Conclusion

This project is kind of complicated and there are many many pitfalls. Please be sure you understand the stuff you want to try. It kind of looks, like you are missing the basics. Game-theory (Min-max), AI/Learning-Theory (MCTS, Markov-Decision-Processes, Q-Learning...), NN (basic internals of a NN).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM