如何獲得 tf-agents 中所有動作的概率向量？

Question

我正在使用LinearUCBAgent和LinearThompsonSamplingAgent解決多臂強盜問題，但它們都返回單個動作進行觀察。 我需要的是可用於排名的所有動作的概率。

Answer 1

您需要在定義代理時添加emit_policy_info參數。 具體值（封裝在元組中）將取決於代理：對於predicted_rewards_sampled的LinearThompsonSamplingAgent和對於predicted_rewards_optimistic的LinearUCBAgent 。

例如：

agent = LinearThompsonSamplingAgent(
        time_step_spec=time_step_spec,
        action_spec=action_spec,
        emit_policy_info=("predicted_rewards_sampled")
    )

然后，在推理過程中，您需要訪問這些字段並對其進行規范化（通過 softmax）：

action_step = agent.collect_policy.action(observation_step)
scores = tf.nn.softmax(action_step.info.predicted_rewards_sampled)

其中tf來自import tensorflow as tf ， observation_step是封裝在 TimeStep 中的觀察數組（ from tf_agents.trajectories.time_step import TimeStep ）

注意：這些不是概率，它們是標准化分數； 類似於全連接層的歸一化輸出。

如何獲得 tf-agents 中所有動作的概率向量？

問題描述

1 個解決方案

解決方案1
0 2022-06-02 19:53:37

如何獲得 tf-agents 中所有動作的概率向量？

問題描述

1 個解決方案

解決方案1 0 2022-06-02 19:53:37

解決方案1
0 2022-06-02 19:53:37