简体   繁体   English

增强智能体训练 Q Learning Taxi V3

[英]Enhancement of Agent Training Q Learning Taxi V3

episode_number = 10000

for i in range(1,episode_number):
    

    state = env.reset()
    
    reward_count = 0
    dropouts = 0
    
    while True:
        
        if random.uniform(0,1) < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(q_table[state])

        next_state, reward, done, _ = env.step(action)
        
        
        old_value = q_table[state, action]  
        next_max = np.max(q_table[next_state]) 
        
        next_value = (1-alpha)*old_value + alpha*(reward + gamma*next_max) 

        q_table[state,action] = next_value
        

        state = next_state
        

        if reward == -10:
            dropouts += 1
            
        if done:
            break
        
        reward_count  += reward
    if i%10 == 0:
        
        dropout_list.append(dropouts)
        reward_list.append(reward_count)
        print("Episode: {}, reward {}, wrong dropout {}".format(i, reward_count,dropouts))

I was required to enhance this code to showcase a comparison of reward and penalties.我被要求增强此代码以展示奖励和惩罚的比较。 How it works is, I have to enhance it by making this code display a comparison of rewards earned before training agent and after training agent.它是如何工作的,我必须通过让这段代码显示训练代理之前和训练代理之后获得的奖励的比较来增强它。 The graph plotted must overlap to show comparison but I could not find a way.绘制的图表必须重叠以显示比较,但我找不到方法。 I have been trying for days but could not find the solution I am looking for.我已经尝试了好几天,但找不到我正在寻找的解决方案。 I hope someone can help assist me in this.我希望有人可以帮助我。

If there is a need to create a new code or a separate code then compare the results, please do let me know.如果需要创建新代码或单独的代码然后比较结果,请告诉我。 thank you.谢谢你。

I think there is a missing term in the affectation of next_value It should be next_value = (1-alpha) old_value + alpha (reward + gamma*next_max- q_table(state,action))我认为在next_value的做作中缺少一个术语应该是next_value = (1-alpha) old_value + alpha (reward + gamma*next_max- q_table(state,action))

Regarding the plots you want to make, you can interactively plot the rewards earned by an agent taking random actions simultaneously with the rewards taken by your agent after reinforcement learning关于您要制作的情节,您可以交互 plot 代理人同时采取随机行动获得的奖励与强化学习后代理人获得的奖励

I doesn't seem that understood but the code you are showing is the learning phase of the agent我似乎不明白,但您显示的代码是代理的学习阶段

After you run it q_table contains the quality of each action in regard to the current state运行后,q_table 包含与当前 state 相关的每个操作的质量

The algorithm for the progression of the agent is then代理进程的算法是

initialize environment 
done := false
while not done                      
    s:= current state                         
    a := argmax(q_table[s])                            
    update s and done by making the action a

I suggest you check this tutorial that covers all of your interrogations I think我建议您查看本教程,该教程涵盖了我认为的所有审讯

https://www.learnpythonwithrune.org/capstone-project-reinforcement-learning-from-scratch-with-python/ https://www.learnpythonwithrune.org/capstone-project-reinforcement-learning-from-scratch-with-python/

Feel free to check the comment section of the post for the concerns regarding the plots随时查看帖子的评论部分,了解有关情节的担忧

I hope I have been helpful我希望我有帮助

Good luck in your work!祝你工作顺利!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 深度 Q 学习 - 训练速度明显减慢 - Deep Q Learning - training slows down significantly q 代理正在学习不采取任何行动 - q agent is learning not to take any actions Agent不断重复同一个动作圈,Q学习 - Agent repeats the same action circle non stop, Q learning 培训强化学习代理时,tflearn损失始终为0.0 - tflearn loss is always 0.0 while training reinforcement learning agent 使用TensorFlow Inception V3进行错误训练图像 - Error training images using tensorflow inception v3 迁移学习 Inception v3 时出现 Errno 13 - Errno 13 while Transfer-learning Inception v3 在保存的 keras Inception v3 model 上开始迁移学习 - Start transfer learning on a saved keras Inception v3 model 使用tf.keras和Inception-v3转移学习:没有进行任何培训 - Transfer learning with tf.keras and Inception-v3: No training is happening 如何在 openai-gym、强化学习的 Bipedalwalker-v3 中获得目标 Q 值? - How do I get Target Q-values in Bipedalwalker-v3 in openai-gym, reinforcement learning? 在 Google Colab 上与在本地机器上训练 DeepLab ResNet V3 之间的巨大差异 - Wild discrepancies between training DeepLab ResNet V3 on Google Colab versus on local machine
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM