简体繁体 English

在初始训练期间预测相同动作的强化学习参与者

[英]Reinforcement learning actor predicting same actions during initial training

原文 2020-07-28 16:55:06 8 1 tensorflow/ pytorch/ artificial-intelligence/ actor/ reinforcement-learning

I have a reinforcement learning Actor Critic model with lstm.我有一个带有 lstm 的强化学习 Actor Critic model。 During initial training it is giving same action value for all the states.在初始训练期间，它为所有状态提供相同的动作值。

Can someone expert in AI/RL please help to let me know if this is normal behavior during training? AI / RL专家可以帮忙告诉我这是否是训练期间的正常行为吗？ Also can you please help to let me know what should be the ideal size of lstm and linear layers if I have a state_dimension = 50 and action_dimension = 3.如果我的 state_dimension = 50 和 action_dimension = 3，你也可以帮我知道 lstm 和线性层的理想大小应该是多少。

Thanks in advance提前致谢

1 个解决方案

This can be caused by numerous things:这可能是由许多原因引起的：

1 - Check weights initialization 1 - 检查权重初始化

2 - Check the interface on which the model makes the inference, and if there is no other things preventing it from make the action choice other than the activation of that specific neuron 2 - 检查 model 进行推理的接口，如果没有其他因素阻止它做出除了激活该特定神经元之外的动作选择

3 - Check your reward function. 3 - 检查您的奖励 function。 Avoid too large negative rewards.避免太大的负面奖励。 Also, if takes the same action is not an obvious way to avoid negative rewards.此外，如果采取相同的行动并不是避免负面奖励的明显方法。

如何进行更快的深度强化学习训练 - How to make faster deep reinforcement learning training

如何训练Actor-Critic（A2C）强化学习 - How to train Actor-Critic (A2C) reinforcement learning

在训练期间启用和禁用变量的学习 - Enable and disable learning of a variable during training

在训练期间切换 AI 学习技术？ - Switching AI learning technique during training?

Model 性能在联邦学习训练期间没有提高 - Model performance not improving during federated learning training

训练过程中过多的步骤会否影响机器学习的训练过程？ - Will excessive steps during training mess up the training process in Machine Learning?

强化学习成本函数 - Reinforcement learning cost function

强化学习中的负面奖励 - Negative reward in reinforcement learning

FMU 中的强化学习代理 - Reinforcement Learning Agent in FMU

预测值与 model 适合的训练数据的形状不同 - Predicting values that are not the same shape as the training data that the model fit to

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何进行更快的深度强化学习训练 - How to make faster deep reinforcement learning training 如何训练Actor-Critic（A2C）强化学习 - How to train Actor-Critic (A2C) reinforcement learning 在训练期间启用和禁用变量的学习 - Enable and disable learning of a variable during training 在训练期间切换 AI 学习技术？ - Switching AI learning technique during training? Model 性能在联邦学习训练期间没有提高 - Model performance not improving during federated learning training 训练过程中过多的步骤会否影响机器学习的训练过程？ - Will excessive steps during training mess up the training process in Machine Learning? 强化学习成本函数 - Reinforcement learning cost function 强化学习中的负面奖励 - Negative reward in reinforcement learning FMU 中的强化学习代理 - Reinforcement Learning Agent in FMU 预测值与 model 适合的训练数据的形状不同 - Predicting values that are not the same shape as the training data that the model fit to

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM