简体繁体 English

MaxQ 是所有可能奖励的总和还是最高可能奖励？

[英]Is MaxQ' sum of all possible rewards or highest possible reward?

原文 2019-07-01 15:53:08 8 1 reinforcement-learning/ q-learning

I'm coding a simple q-learning example and to update q-values you need a maxQ'.我正在编写一个简单的 q-learning 示例并更新 q 值，您需要一个 maxQ'。

I'm not sure if maxQ' is referring to the sum of all possible rewards or the highest possible reward:我不确定 maxQ' 是指所有可能奖励的总和还是最高可能奖励：

1 个解决方案

That is maximum Q-values among all possible actions for the state s' .这是状态s'所有可能操作中的最大Q-values 。 Basically, you need to take a max over all Q(s',a') for all valid actions a' in state s' .基本上，您需要对状态s'所有有效操作a'所有Q(s',a')取max 。

奖励矩阵中的状态和奖励是什么？ - What are the states and rewards in the reward matrix?

如何在终端状态下通过奖励处理可变长度剧集的奖励 - How to handle rewards for variable length episodes with reward at terminal state

哪种强化学习算法适用于具有连续可变奖励且没有中间奖励的问题？ - Which reinforcement learning algorithm is applicable to a problem with a continuously variable reward and no intermediate rewards?

在 Python 中是否有一个有效的 np.sum 和指数运算符？ - Is there an efficient np.sum with exponent operator possible in Python?

QLearning中的负面奖励 - Negative rewards in QLearning

基础强化学习中的折扣奖励 - Discounted rewards in basic reinforcement learning

使用 RLlib 时，如何防止我在评估运行期间收到的奖励金额重复出现？ - How to prevent my reward sum received during evaluation runs repeating in intervals when using RLlib?

强化学习中的负面奖励 - Negative reward in reinforcement learning

奖励的学习和索引 - Qlearning and indexing of reward

是否可以修改 OpenAI 环境？ - Is it possible to modify OpenAI environments?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 奖励矩阵中的状态和奖励是什么？ - What are the states and rewards in the reward matrix? 如何在终端状态下通过奖励处理可变长度剧集的奖励 - How to handle rewards for variable length episodes with reward at terminal state 哪种强化学习算法适用于具有连续可变奖励且没有中间奖励的问题？ - Which reinforcement learning algorithm is applicable to a problem with a continuously variable reward and no intermediate rewards? 在 Python 中是否有一个有效的 np.sum 和指数运算符？ - Is there an efficient np.sum with exponent operator possible in Python? QLearning中的负面奖励 - Negative rewards in QLearning 基础强化学习中的折扣奖励 - Discounted rewards in basic reinforcement learning 使用 RLlib 时，如何防止我在评估运行期间收到的奖励金额重复出现？ - How to prevent my reward sum received during evaluation runs repeating in intervals when using RLlib? 强化学习中的负面奖励 - Negative reward in reinforcement learning 奖励的学习和索引 - Qlearning and indexing of reward 是否可以修改 OpenAI 环境？ - Is it possible to modify OpenAI environments?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM