簡體 English 中英

如何在穩定的基線（在狀態-動作對上）評估 sac 代理的 q 值網絡？

[英]How to evaluate q-value network of sac agent in stable baselines (on a state-action pair)?

原文 2022-07-16 15:34:58 1 1 machine-learning/ reinforcement-learning/ stable-baselines

我在穩定的基線中實現 SAC 代理，需要在我的自定義環境中評估 q 值網絡。我試圖從 SAC 類對象中獲取 q 值，但失敗了。 任何像 PPO (.value) 這樣的方法或函數都會很有幫助。

1 個解決方案

我們不評估價值函數，我們評估政策。

Q值的無限增加，是在Q-Learning中重復相同操作后重復獎勵的結果

[英]Unbounded increase in Q-Value, consequence of recurrent reward after repeating the same action in Q-Learning

為自定義環境調整穩定的基線代理

[英]Tuning a stable baselines agent for a custom env

有沒有辦法讀取/打印穩定基線中神經網絡隱藏層的激活？

[英]Is there a way to read/print the activations of the hidden layers of a Neural Network in Stable Baselines?

我如何 go 關於 Open AI Gym 和 stable_baselines3 中的這個錯誤？

[英]How do I go about this error in Open AI Gym and stable_baselines3?

了解穩定基線中的自定義策略3

[英]Understanding custom policies in stable-baselines3

具有狀態-行為-狀態獎勵結構的Q學習和具有狀態作為行，行為作為列的Q矩陣

[英]Q-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columns

使用帶有穩定基線的模仿學習預訓練 Model3

[英]Pre-Train a Model using imitation learning with Stable-baselines3

如何正確評估神經網絡模型？

[英]How to correctly evaluate a neural network model?

在隨機化的網格世界中從 stable_baselines3 訓練 PPO

[英]Training PPO from stable_baselines3 on a grid world that randomizes

Q學習代理的學習率

[英]Learning rate of a Q learning agent

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Q值的無限增加，是在Q-Learning中重復相同操作后重復獎勵的結果為自定義環境調整穩定的基線代理有沒有辦法讀取/打印穩定基線中神經網絡隱藏層的激活？我如何 go 關於 Open AI Gym 和 stable_baselines3 中的這個錯誤？了解穩定基線中的自定義策略3 具有狀態-行為-狀態獎勵結構的Q學習和具有狀態作為行，行為作為列的Q矩陣使用帶有穩定基線的模仿學習預訓練 Model3 如何正確評估神經網絡模型？在隨機化的網格世界中從 stable_baselines3 訓練 PPO Q學習代理的學習率

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM