[英]why do i get: IndexError: tuple index out of range?
i've been trying to create a tictactoe agent and when i ran the training loop i got an error pointing towards the (.0) in the function that checks whether someone won(check_game_status).我一直在尝试创建一个 tictactoe 代理,当我运行训练循环时,我收到一个错误,指向 function 中的(.0),用于检查是否有人获胜(check_game_status)。 if you need the whole environment let me know.
如果您需要整个环境,请告诉我。
here is the way the board was created:这是董事会的创建方式:
board = [0] * 9
which means the 0 means empty spot, 1 for O and 2 for x.这意味着 0 表示空点,1 表示 O,2 表示 x。
here is after_action_state function:这是 after_action_state function:
def after_action_state(state, action):
"""Execute an action and returns resulted state.
Args:
state (tuple): Board status + mark
action (int): Action to run
Returns:
tuple: New state
"""
board, mark = state
nboard = list(board[:])
nboard[action] = tocode(mark)
nboard = tuple(nboard)
return nboard, next_mark(mark)
tocode function basically converts 'X' to 2 and 'O' to 1 tocode function 基本上将“X”转换为 2,将“O”转换为 1
this is the function that the error pointed at:这是错误指向的 function:
def check_game_status(board):
"""Return game status by current board status.
Args:
board (list): Current board state
Returns:
int:
-1: game in progress
0: draw game,
1 or 2 for finished game(winner mark code).
"""
for t in [1, 2]:
for j in range(0, 9, 3):
if [t] * 3 == [board[i] for i in range(j, j+3)]:
return t
for j in range(0, 3):
if board[j] == t and board[j+3] == t and board[j+6] == t:
return t
if board[0] == t and board[4] == t and board[8] == t:
return t
if board[2] == t and board[4] == t and board[6] == t:
return t
for i in range(9):
if board[i] == 0:
# still playing
return -1
# draw game
return 0
this is the play_one_step function that takes experience and puts it in a replay buffer using greedy epsilon policy.这是 play_one_step function,它使用贪婪 epsilon 策略获取经验并将其放入重放缓冲区。
def play_one_step(self, env ,state, available_actions, agents_model ,epsilon= 0.2):
action = self.greedy_policy(state, available_actions, agents_model ,epsilon= epsilon)
next_state, reward, done, info = env.step(action)
self.replay_buffer.append((state, action,reward, next_state, done))
return next_state, reward, done, info
and this is the epsilon greedy policy:这是 epsilon 贪婪策略:
def greedy_policy(self, state ,available_actions, agents_model ,epsilon = 0.2):
if np.random.rand() < epsilon:
return np.random.choice(available_actions)
else:
for next_action in available_actions: # checking if the next possible action wins the game and if it does then it returns it
next_state = after_action_state(state, next_action)
game_status = check_game_status(next_state)
if game_status > 0 & tomark(game_status)==self.mark:
return next_action
Q_values = agents_model.predict(state[np.newaxis])
return np.argmax(Q_values[0])
this is the training loop:这是训练循环:
agent_1 = Agent('X', model_1)
agent_2 = Agent('O', model_2)
agent_1_rewards = []
agent_2_rewards = []
agents = [agent_1, agent_2]
n_episodes = 600
start_mark = 'O'
batch_size = 15
for episode in range(n_episodes):
env = TicTacToeEnv()
env.set_start_mark(start_mark)
state = env.reset()
while not env.done:
_ , mark = state
available_actions = env.available_actions()
epsilon = max( 1 - episode/500 , 0.01)
agent = agent_by_mark(agents, mark)
agents_model = agent.agents_model
next_state, reward, done, _ = agent.play_one_step(env, state, available_actions, agents_model ,epsilon)
state = next_state
if agent is agent_1:
agent_1_rewards.append(reward)
else:
agent_2_rewards.append(reward)
env.render()
if episode > 60:
agent.training_step(15)
this is the error trace:这是错误跟踪:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17792/1466320926.py in <module>
26
27
---> 28 next_state, reward, done, _ = agent.play_one_step(env, state, available_actions, agents_model ,epsilon)
29 state = next_state
30 if agent is agent_1:
~\AppData\Local\Temp/ipykernel_17792/3137506088.py in play_one_step(self, env, state, available_actions, agents_model, epsilon)
60
61 def play_one_step(self, env ,state, available_actions, agents_model ,epsilon= 0.2):
---> 62 action = self.greedy_policy(state, available_actions, agents_model ,epsilon= epsilon)
63 next_state, reward, done, info = env.step(action)
64 self.replay_buffer.append((state, action,reward, next_state, done))
~\AppData\Local\Temp/ipykernel_17792/3137506088.py in greedy_policy(self, state, available_actions, agents_model, epsilon)
35 for next_action in available_actions: # checking if the next possible action wins the game and if it does then it returns it
36 next_state = after_action_state(state, next_action)
---> 37 game_status = check_game_status(next_state)
38 if game_status > 0 & tomark(game_status)==self.mark:
39 return next_action
g:\Code\TicTacToe\gym-tictactoe\gym_tictactoe\env.py in check_game_status(board)
66 for t in [1, 2]:
67 for j in range(0, 9, 3):
---> 68 if [t] * 3 == [board[i] for i in range(j, j+3)]:
69 return t
70 for j in range(0, 3):
g:\Code\TicTacToe\gym-tictactoe\gym_tictactoe\env.py in <listcomp>(.0)
66 for t in [1, 2]:
67 for j in range(0, 9, 3):
---> 68 if [t] * 3 == [board[i] for i in range(j, j+3)]:
69 return t
70 for j in range(0, 3):
IndexError: tuple index out of range
does anyone know what causes this issue?有谁知道是什么导致了这个问题? i already tried to rewrite the function only and give it a few cases and it worked absolutely fine with no index problem.
我已经尝试只重写 function 并给它几个案例,它工作得非常好,没有索引问题。 if you need more of the code let me know.
如果您需要更多代码,请告诉我。 thanks in advance!
提前致谢!
It is not a fix but explains the cause of bug... In the below snippet, you are trying to access board[i]
with index i > 9
but your board is of size 9
.这不是修复程序,而是解释了错误的原因...在下面的代码段中,您尝试使用索引
i > 9
访问board[i]
,但您的 board 大小为9
。 For instance, check for j=9
.例如,检查
j=9
。
for j in range(0, 9, 3):
if [t] * 3 == [board[i] for i in range(j, j+3)]:
return t
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.