使用 toch log_prob 计算选择分布本身的多个值的概率

Question

I try to use log_prob to get the probability of selecting a value from a normal distribution, I got dist from a neural.network and action from dist.sample()我尝试使用 log_prob 来获取从正态分布中选择一个值的概率，我从 neural.network 获得 dist 并从dist.sample()获得动作

In a learning phase, I give 5 tensors to a neural.network, and it gives me 5 dist, and from dists, I got 5 actions.在学习阶段，我给 neural.network 5 个张量，它给了我 5 个距离，从距离中，我得到了 5 个动作。 The problem is that I want to select an action over its own distribution, but this function gives me the probability of action in all distributions.问题是我想 select 对它自己的分布采取行动，但是这个 function 给了我在所有分布中行动的概率。 The data on the diameter of the output matrix is the values I want, but I wonder if there is an easy way to implement this part? output矩阵的直径上的数据就是我想要的值，但是不知道有没有简单的实现这部分的方法？

I use this block of code:我使用这段代码：

states = T.tensor(state[b], dtype=T.float).to(agent.device)
old_probs = T.tensor(log_prob[b]).to(agent.device)
actions = T.tensor(action[b]).to(agent.device)
values = T.tensor(value[b]).to(agent.device)

dist = actor(states)
new_probs = dist.log_prob(actions)

and the output is output 是

tensor([[-1.1823, -0.9680, -3.6280, -1.1112, -1.9610],
        [-1.5279, -1.1463, -2.5806, -1.0561, -1.4768],
        [-1.6258, -1.1618, -2.5027, -1.0100, -1.3882],
        [-1.6125, -1.1576, -2.5169, -1.0133, -1.3989],
        [-1.3384, -1.0965, -2.9404, -1.1370, -1.7129]], device='cuda:0',
       dtype=torch.float64, grad_fn=<SubBackward0>)

but the output must be like:但 output 必须是这样的：

tensor([-1.1823, -1.1463, -2.5027, -1.0133, -1.7129], device='cuda:0',
       grad_fn=<SqueezeBackward1>)

Answer 1

You can select the diagonal of your matrix withtorch.diag :您可以使用torch.diag select 矩阵的对角线：

>>> new_probs.diag()
tensor([-1.1823, -1.1463, -2.5027, -1.0133, -1.7129], 
  device='cuda:0', grad_fn=<DiagBackward>)

使用 toch log_prob 计算选择分布本身的多个值的概率

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-08-15 14:06:21

使用 toch log_prob 计算选择分布本身的多个值的概率

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-08-15 14:06:21

解决方案1
0 已采纳 2021-08-15 14:06:21