简体   繁体   English

在初始训练期间预测相同动作的强化学习参与者

[英]Reinforcement learning actor predicting same actions during initial training

I have a reinforcement learning Actor Critic model with lstm.我有一个带有 lstm 的强化学习 Actor Critic model。 During initial training it is giving same action value for all the states.在初始训练期间,它为所有状态提供相同的动作值。

Can someone expert in AI/RL please help to let me know if this is normal behavior during training? AI / RL专家可以帮忙告诉我这是否是训练期间的正常行为吗? Also can you please help to let me know what should be the ideal size of lstm and linear layers if I have a state_dimension = 50 and action_dimension = 3.如果我的 state_dimension = 50 和 action_dimension = 3,你也可以帮我知道 lstm 和线性层的理想大小应该是多少。

Thanks in advance提前致谢

This can be caused by numerous things:这可能是由许多原因引起的:

1 - Check weights initialization 1 - 检查权重初始化

2 - Check the interface on which the model makes the inference, and if there is no other things preventing it from make the action choice other than the activation of that specific neuron 2 - 检查 model 进行推理的接口,如果没有其他因素阻止它做出除了激活该特定神经元之外的动作选择

3 - Check your reward function. 3 - 检查您的奖励 function。 Avoid too large negative rewards.避免太大的负面奖励。 Also, if takes the same action is not an obvious way to avoid negative rewards.此外,如果采取相同的行动并不是避免负面奖励的明显方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM