簡體 English 中英

在初始訓練期間預測相同動作的強化學習參與者

[英]Reinforcement learning actor predicting same actions during initial training

原文 2020-07-28 16:55:06 4 1 tensorflow/ pytorch/ artificial-intelligence/ actor/ reinforcement-learning

我有一個帶有 lstm 的強化學習 Actor Critic model。 在初始訓練期間，它為所有狀態提供相同的動作值。

AI / RL專家可以幫忙告訴我這是否是訓練期間的正常行為嗎？ 如果我的 state_dimension = 50 和 action_dimension = 3，你也可以幫我知道 lstm 和線性層的理想大小應該是多少。

提前致謝

1 個解決方案

這可能是由許多原因引起的：

1 - 檢查權重初始化

2 - 檢查 model 進行推理的接口，如果沒有其他因素阻止它做出除了激活該特定神經元之外的動作選擇

3 - 檢查您的獎勵 function。 避免太大的負面獎勵。 此外，如果采取相同的行動並不是避免負面獎勵的明顯方法。

如何進行更快的深度強化學習訓練

[英]How to make faster deep reinforcement learning training

如何訓練Actor-Critic（A2C）強化學習

[英]How to train Actor-Critic (A2C) reinforcement learning

在訓練期間啟用和禁用變量的學習

[英]Enable and disable learning of a variable during training

在訓練期間切換 AI 學習技術？

[英]Switching AI learning technique during training?

Model 性能在聯邦學習訓練期間沒有提高

[英]Model performance not improving during federated learning training

訓練過程中過多的步驟會否影響機器學習的訓練過程？

[英]Will excessive steps during training mess up the training process in Machine Learning?

強化學習成本函數

[英]Reinforcement learning cost function

強化學習中的負面獎勵

[英]Negative reward in reinforcement learning

FMU 中的強化學習代理

[英]Reinforcement Learning Agent in FMU

預測值與 model 適合的訓練數據的形狀不同

[英]Predicting values that are not the same shape as the training data that the model fit to

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何進行更快的深度強化學習訓練如何訓練Actor-Critic（A2C）強化學習在訓練期間啟用和禁用變量的學習在訓練期間切換 AI 學習技術？ Model 性能在聯邦學習訓練期間沒有提高訓練過程中過多的步驟會否影響機器學習的訓練過程？強化學習成本函數強化學習中的負面獎勵 FMU 中的強化學習代理預測值與 model 適合的訓練數據的形狀不同

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM