如何获得 tf-agents 中所有动作的概率向量？

Question

I'm working on Multi-Armed-Bandit problem, using LinearUCBAgent and LinearThompsonSamplingAgent but they both return a single action for an observation.我正在使用LinearUCBAgent和LinearThompsonSamplingAgent解决多臂强盗问题，但它们都返回单个动作进行观察。 What I need is the probability for all the action which I can use for ranking.我需要的是可用于排名的所有动作的概率。

Answer 1

You need to add the emit_policy_info argument when defining the agent.您需要在定义代理时添加emit_policy_info参数。 The specific values (encapsulated in a tuple) will depend on the agent: predicted_rewards_sampled for LinearThompsonSamplingAgent and predicted_rewards_optimistic for LinearUCBAgent .具体值（封装在元组中）将取决于代理：对于predicted_rewards_sampled的LinearThompsonSamplingAgent和对于predicted_rewards_optimistic的LinearUCBAgent 。

For example:例如：

agent = LinearThompsonSamplingAgent(
        time_step_spec=time_step_spec,
        action_spec=action_spec,
        emit_policy_info=("predicted_rewards_sampled")
    )

Then, during inference, you'll need to access those fields and normalize them (via softmax):然后，在推理过程中，您需要访问这些字段并对其进行规范化（通过 softmax）：

action_step = agent.collect_policy.action(observation_step)
scores = tf.nn.softmax(action_step.info.predicted_rewards_sampled)

where tf comes from import tensorflow as tf and observation_step is your observation array encapsulated in a TimeStep ( from tf_agents.trajectories.time_step import TimeStep )其中tf来自import tensorflow as tf ， observation_step是封装在 TimeStep 中的观察数组（ from tf_agents.trajectories.time_step import TimeStep ）

Note of caution: these are NOT probabilities, they are normalized scores;注意：这些不是概率，它们是标准化分数； similar to the normalized outputs of a fully-connected layer.类似于全连接层的归一化输出。

如何获得 tf-agents 中所有动作的概率向量？

问题描述

1 个解决方案

解决方案1
0 2022-06-02 19:53:37

如何获得 tf-agents 中所有动作的概率向量？

问题描述

1 个解决方案

解决方案1 0 2022-06-02 19:53:37

解决方案1
0 2022-06-02 19:53:37