简体   繁体   English

Q-learning中的学习曲线

[英]Learning Curve in Q-learning

My question is I wrote the Q-learning algorithm in c++ with epsilon greedy policy now I have to plot the learning curve for the Q-values.我的问题是我在 c++ 中使用 epsilon 贪心策略编写了 Q 学习算法,现在我必须 plot 的 Q 值的学习曲线。 What exactly I should have to plot because I have an 11x5 Q matrix, so should I take one Q value and plot its learning or should I have to take the whole matrix for a learning curve, could you guide me with it.我应该对 plot 究竟有什么,因为我有一个 11x5 Q 矩阵,所以我应该取一个 Q 值和 plot 它的学习还是我必须取整个矩阵作为学习曲线,你能指导我吗? Thank you谢谢

Learning curves in RL are typically plots of returns over time, not Q-losses or anything like this. RL 中的学习曲线通常是随时间变化的回报图,而不是 Q 损失或类似的东西。 So you should run your environment, compute the total reward (aka return) and plot it at a corresponding time.所以你应该运行你的环境,计算总奖励(又名回报)和 plot 它在相应的时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM