在Q-Learning中获得state的TicTacToe棋盘

Question

I'm just getting into reinforcement learning and q-learning, and I wanted to try and create a Tic-Tac-Toe AI.我刚刚进入强化学习和 q-learning，我想尝试创建一个 Tic-Tac-Toe AI。 With a Q-Table, I need to find the "state" of the board, and I was having trouble finding a way to do this.使用 Q 表，我需要找到电路板的“状态”，但我很难找到一种方法来做到这一点。

For extra clarification, a state is a number that represents the current board, including the value of each of the nine squares.为了进一步说明，state 是一个代表当前棋盘的数字，包括九个方格中每个方格的值。

A board that looks like:一个看起来像的板：

[[0, 0, 0],
 [0, 0, 0],
 [0, 0, 0]]

would be state 0, as it is the first board.将是 state 0，因为它是第一块板。 Beyond this, I am not sure how to calculate the state of the board based on the array.除此之外，我不确定如何根据数组计算电路板的 state。

[EDIT] I'm coming here because I honestly don't know where to start; [编辑] 我来这里是因为我真的不知道从哪里开始； I can't find anything on the web, and if you dislike my question you could at least tell me why.我在 web 上找不到任何内容，如果您不喜欢我的问题，至少可以告诉我原因。

Answer 1

I think you need something like this.我想你需要这样的东西。

import numpy as np
max_number = 10
L = [[1, 0, 0],
 [0, 0, 0],
 [0, 5, 0]]

L_1d = sum(L, [])
print(L_1d)
# [1, 0, 0, 0, 0, 0, 0, 5, 0]
degrees = max_number ** np.arange(len(L_1d))
print(degrees)
# [        1        10       100      1000     10000    100000   1000000   10000000 100000000]
state = L_1d @ degrees
print(state)
# 50000001

在Q-Learning中获得state的TicTacToe棋盘

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-11 14:26:02

在Q-Learning中获得state的TicTacToe棋盘

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-11 14:26:02

解决方案1
1 已采纳 2020-06-11 14:26:02