繁体 English 中英

强化学习 - 动作数量

[英]reinforcement learning - number of actions

原文 2020-03-14 11:20:51 9 1 python/ reinforcement-learning

阅读https://towardsdatascience.com/reinforcement-learning-temporal-difference-sarsa-q-learning-expected-sarsa-on-python-9fecfda7467e epsilon_greedy定义为：

def epsilon_greedy(Q, epsilon, n_actions, s, train=False):
    """
    @param Q Q values state x action -> value
    @param epsilon for exploration
    @param s number of states
    @param train if true then no random actions selected
    """
    if train or np.random.rand() < epsilon:
        action = np.argmax(Q[s, :])
    else:
        action = np.random.randint(0, n_actions)
    return action

参数n_actions是代理可用的操作数吗？ 因此，如果代理正在学习踢足球并且可用的动作是 {kick, don't kick} n_actions = 2

1 个解决方案

你是对的。 通常，您定义一个字典，其中包含整数和您的代理可以执行的每个操作之间的映射。 您可以看到，当您没有选择最佳操作索引时，函数 n_actions 正好用于对随机操作索引进行采样。

奖励正在收敛，但强化学习中的动作不正确

[英]Reward is converging but actions are not correct in reinforcement learning

强化学习，钟摆蟒蛇

[英]Reinforcement learning, pendulum python

强化学习中的负面奖励

[英]Negative reward in reinforcement learning

强化学习中的时间步长

[英]Time step in reinforcement learning

简单的界面用于强化学习

[英]Simple interface for reinforcement learning

监督数据集上的强化学习

[英]Reinforcement Learning on a Supervised Dataset

用Keras模型进行强化学习

[英]Reinforcement Learning with Keras model

通过强化学习进行回归

[英]Regression through reinforcement learning

联合强化学习

[英]Federated reinforcement learning

基础强化学习中的折扣奖励

[英]Discounted rewards in basic reinforcement learning

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 奖励正在收敛，但强化学习中的动作不正确强化学习，钟摆蟒蛇强化学习中的负面奖励强化学习中的时间步长简单的界面用于强化学习监督数据集上的强化学习用Keras模型进行强化学习通过强化学习进行回归联合强化学习基础强化学习中的折扣奖励

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM