简体   繁体   English

如何获得 tf-agents 中所有动作的概率向量?

[英]How to get probability vector for all actions in tf-agents?

I'm working on Multi-Armed-Bandit problem, using LinearUCBAgent and LinearThompsonSamplingAgent but they both return a single action for an observation.我正在使用LinearUCBAgentLinearThompsonSamplingAgent解决多臂强盗问题,但它们都返回单个动作进行观察。 What I need is the probability for all the action which I can use for ranking.我需要的是可用于排名的所有动作的概率。

You need to add the emit_policy_info argument when defining the agent.您需要在定义代理时添加emit_policy_info参数。 The specific values (encapsulated in a tuple) will depend on the agent: predicted_rewards_sampled for LinearThompsonSamplingAgent and predicted_rewards_optimistic for LinearUCBAgent .具体值(封装在元组中)将取决于代理:对于predicted_rewards_sampledLinearThompsonSamplingAgent和对于predicted_rewards_optimisticLinearUCBAgent

For example:例如:

agent = LinearThompsonSamplingAgent(
        time_step_spec=time_step_spec,
        action_spec=action_spec,
        emit_policy_info=("predicted_rewards_sampled")
    )

Then, during inference, you'll need to access those fields and normalize them (via softmax):然后,在推理过程中,您需要访问这些字段并对其进行规范化(通过 softmax):

action_step = agent.collect_policy.action(observation_step)
scores = tf.nn.softmax(action_step.info.predicted_rewards_sampled)

where tf comes from import tensorflow as tf and observation_step is your observation array encapsulated in a TimeStep ( from tf_agents.trajectories.time_step import TimeStep )其中tf来自import tensorflow as tfobservation_step是封装在 TimeStep 中的观察数组( from tf_agents.trajectories.time_step import TimeStep

Note of caution: these are NOT probabilities, they are normalized scores;注意:这些不是概率,它们是标准化分数; similar to the normalized outputs of a fully-connected layer.类似于全连接层的归一化输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 anaconda 中安装 tf-agents - How to install tf-agents in anaconda 如何在 step 方法中将动作元组提供给 TF-Agents 环境? - How to give a tuple of actions to a TF-Agents environment within the step method? 如何在 Tensor Flow 中调整 tf-agents 和策略的超参数? - How to tune hyperparameters of tf-agents and policies in Tensor Flow? 如何在批量学习中为 tf-agents 定义正确的形状 - How to define correct shape for tf-agents in batch learning 如何在 Tf-agents 中传递自定义环境的批量大小 - How to pass the batchsize for a custom environment in Tf-agents tf.agent 策略可以为所有动作返回概率向量吗? - Can tf.agent policy return probability vector for all actions? 使用 PyEnvironment TF-Agents 包装自定义健身房 - Wrapping custom gym with PyEnvironment TF-Agents Tf-Agents ParallelPyEnvironment静默失败 - Tf-Agents ParallelPyEnvironment fails silently 如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError? - How to fix a TypeError between policy_state and policy_state_spec in TF-Agents? 如何在 python 的大查询中存储 tf-agents 的轨迹对象并将其作为轨迹对象检索回来 - How to store tf-agents' trajectory object in big query from python and retrieve it back as the trajectory object
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM