简体   繁体   English

为什么我会得到:IndexError:元组索引超出范围?

[英]why do i get: IndexError: tuple index out of range?

i've been trying to create a tictactoe agent and when i ran the training loop i got an error pointing towards the (.0) in the function that checks whether someone won(check_game_status).我一直在尝试创建一个 tictactoe 代理,当我运行训练循环时,我收到一个错误,指向 function 中的(.0),用于检查是否有人获胜(check_game_status)。 if you need the whole environment let me know.如果您需要整个环境,请告诉我。

here is the way the board was created:这是董事会的创建方式:

board = [0] * 9

which means the 0 means empty spot, 1 for O and 2 for x.这意味着 0 表示空点,1 表示 O,2 表示 x。

here is after_action_state function:这是 after_action_state function:

def after_action_state(state, action):
    """Execute an action and returns resulted state.

    Args:
        state (tuple): Board status + mark
        action (int): Action to run

    Returns:
        tuple: New state
    """

    board, mark = state
    nboard = list(board[:])
    nboard[action] = tocode(mark)
    nboard = tuple(nboard)
    return nboard, next_mark(mark)

tocode function basically converts 'X' to 2 and 'O' to 1 tocode function 基本上将“X”转换为 2,将“O”转换为 1

this is the function that the error pointed at:这是错误指向的 function:

def check_game_status(board):
"""Return game status by current board status.

Args:
    board (list): Current board state

Returns:
    int:
        -1: game in progress
        0: draw game,
        1 or 2 for finished game(winner mark code).
"""
for t in [1, 2]:
    for j in range(0, 9, 3):
        if [t] * 3 == [board[i] for i in range(j, j+3)]:
            return t
    for j in range(0, 3):
        if board[j] == t and board[j+3] == t and board[j+6] == t:
            return t
    if board[0] == t and board[4] == t and board[8] == t:
        return t
    if board[2] == t and board[4] == t and board[6] == t:
        return t

for i in range(9):
    if board[i] == 0:
        # still playing
        return -1

# draw game
return 0

this is the play_one_step function that takes experience and puts it in a replay buffer using greedy epsilon policy.这是 play_one_step function,它使用贪婪 epsilon 策略获取经验并将其放入重放缓冲区。

 def play_one_step(self, env ,state, available_actions, agents_model ,epsilon= 0.2):
    action = self.greedy_policy(state, available_actions, agents_model ,epsilon= epsilon)
    next_state, reward, done, info = env.step(action)
    self.replay_buffer.append((state, action,reward, next_state, done))
    return next_state, reward, done, info

and this is the epsilon greedy policy:这是 epsilon 贪婪策略:

def greedy_policy(self, state ,available_actions, agents_model ,epsilon = 0.2):
        if np.random.rand() < epsilon:
            return np.random.choice(available_actions)
        
        else:
            
            for next_action in available_actions: # checking if the next possible action wins the game and if it does then it returns it
                next_state = after_action_state(state, next_action)
                game_status = check_game_status(next_state)
                if game_status > 0 & tomark(game_status)==self.mark:
                    return next_action
                
            Q_values = agents_model.predict(state[np.newaxis])
            return np.argmax(Q_values[0])

this is the training loop:这是训练循环:

agent_1 = Agent('X', model_1)
agent_2 = Agent('O', model_2)
agent_1_rewards = []
agent_2_rewards = []
agents = [agent_1, agent_2]


n_episodes = 600
start_mark = 'O'
batch_size = 15



for episode in range(n_episodes):
    env = TicTacToeEnv()
    env.set_start_mark(start_mark)  
    state = env.reset()
    while not env.done:
        _ , mark = state
        available_actions = env.available_actions()
        epsilon = max( 1 - episode/500 , 0.01)
        agent = agent_by_mark(agents, mark)
        agents_model = agent.agents_model
        
        
        next_state, reward, done, _ = agent.play_one_step(env, state, available_actions, agents_model  ,epsilon)
        state = next_state
        if agent is agent_1:
            agent_1_rewards.append(reward)
        else:
            agent_2_rewards.append(reward)
            
            
        env.render()
        
    
    if episode > 60:
        agent.training_step(15)

this is the error trace:这是错误跟踪:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17792/1466320926.py in <module>
     26 
     27 
---> 28         next_state, reward, done, _ = agent.play_one_step(env, state, available_actions, agents_model  ,epsilon)
     29         state = next_state
     30         if agent is agent_1:

~\AppData\Local\Temp/ipykernel_17792/3137506088.py in play_one_step(self, env, state, available_actions, agents_model, epsilon)
     60 
     61     def play_one_step(self, env ,state, available_actions, agents_model ,epsilon= 0.2):
---> 62         action = self.greedy_policy(state, available_actions, agents_model ,epsilon= epsilon)
     63         next_state, reward, done, info = env.step(action)
     64         self.replay_buffer.append((state, action,reward, next_state, done))

~\AppData\Local\Temp/ipykernel_17792/3137506088.py in greedy_policy(self, state, available_actions, agents_model, epsilon)
     35                 for next_action in available_actions: # checking if the next possible action wins the game and if it does then it returns it
     36                     next_state = after_action_state(state, next_action)
---> 37                     game_status = check_game_status(next_state)
     38                     if game_status > 0 & tomark(game_status)==self.mark:
     39                         return next_action

g:\Code\TicTacToe\gym-tictactoe\gym_tictactoe\env.py in check_game_status(board)
     66     for t in [1, 2]:
     67         for j in range(0, 9, 3):
---> 68             if [t] * 3 == [board[i] for i in range(j, j+3)]:
     69                 return t
     70         for j in range(0, 3):

g:\Code\TicTacToe\gym-tictactoe\gym_tictactoe\env.py in <listcomp>(.0)
     66     for t in [1, 2]:
     67         for j in range(0, 9, 3):
---> 68             if [t] * 3 == [board[i] for i in range(j, j+3)]:
     69                 return t
     70         for j in range(0, 3):

IndexError: tuple index out of range

does anyone know what causes this issue?有谁知道是什么导致了这个问题? i already tried to rewrite the function only and give it a few cases and it worked absolutely fine with no index problem.我已经尝试只重写 function 并给它几个案例,它工作得非常好,没有索引问题。 if you need more of the code let me know.如果您需要更多代码,请告诉我。 thanks in advance!提前致谢!

It is not a fix but explains the cause of bug... In the below snippet, you are trying to access board[i] with index i > 9 but your board is of size 9 .这不是修复程序,而是解释了错误的原因...在下面的代码段中,您尝试使用索引i > 9访问board[i] ,但您的 board 大小为9 For instance, check for j=9 .例如,检查j=9

for j in range(0, 9, 3):
        if [t] * 3 == [board[i] for i in range(j, j+3)]:
            return t

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM