简体   繁体   English

tensorflow_probability:反向传播正态分布样本的 log_prob 时,梯度始终为零

[英]tensorflow_probability: Gradients always zero when backpropagating the log_prob of a sample of a normal distribution

As part of a project I am having trouble with the gradients of a normal distribution with tensorflow_probability.作为项目的一部分,我在处理具有 tensorflow_probability 的正态分布的梯度时遇到了问题。 For this I create a normal distribution of which a sample is drawn.为此,我创建了一个正态分布,其中抽取了一个样本。 The log_prob of this sample shall then be fed into an optimizer to update the weights of network.然后将这个样本的 log_prob 送入优化器以更新网络的权重。

If I get the log_prob of some constant I always get non-zero gradients.如果我得到某个常数的 log_prob,我总是得到非零梯度。 Unfortunately I have not found any relevant help in tutorials or similar sources of help.不幸的是,我在教程或类似的帮助来源中没有找到任何相关的帮助。

def get_log_prob(mu, std)
   extracted_location = tf.squeeze(extracted_location)
   normal = tfd.Normal(mu, scale=std)
   samples = normal.sample(sample_shape=(1))
   log_prob = normal.log_prob(samples)
   return log_prob

const = tf.constant([0.1], dtype=np.float32)

log_prob = get_log_prob(const, 0.01)
grads = tf.gradients(log_prob, const)

with tf.Session() as sess:
   gradients = sess.run([grads])


print('gradients', gradients)

Output: gradients [array([0.], dtype=float32)]输出:梯度 [array([0.], dtype=float32)]

I expect to get non-zero gradients if when computing the gradient of a sample.如果在计算样本的梯度时,我希望得到非零梯度。 Instead the output is always "0."相反,输出始终为“0”。

This is a consequence of TensorFlow Probability implementing reparameterization gradients (aka the "reparameterization trick", and in fact is the correct answer in certain situations. Let me show you how that 0. answer comes about.这是 TensorFlow Probability 实现重新参数化梯度(又名“重新参数化技巧”)的结果,实际上是某些情况下的正确答案。让我向您展示0.答案是如何产生的。

One way to generate a sample from a normal distribution with some location and scale is to first generate a sample from a standard normal distribution (this is usually some library provided function, eg tf.random.normal in TensorFlow) and then shift and scale it.从具有某个位置和比例的正态分布生成样本的一种方法是首先从标准正态分布(这通常是某个库提供的函数,例如 TensorFlow 中的tf.random.normal )生成样本,然后对其进行移位和缩放. Eg let's say the output of tf.random.normal is z .例如,假设tf.random.normal的输出是z To get a sample x from the normal distribution with location loc and scale scale , you'd do: x = z * scale + loc .要从具有 location loc和 scale scale的正态分布中获取样本x ,您可以执行以下操作: x = z * scale + loc

Now, how does one compute value of the probability density of a number under the normal distribution?现在,如何计算正态分布下数字的概率密度值? One way to do it is to reverse that transformation, so that you're now dealing with a standard normal distribution, and then compute the log-probability density there.一种方法是反转该转换,以便您现在处理标准正态分布,然后计算那里的对数概率密度。 Ie log_prob(x) = log_prob_std_normal((x - loc) / scale) + f(scale) (the f(scale) term comes about from the change of variables involved in the transformation, it's form doesn't matter for this explanation).log_prob(x) = log_prob_std_normal((x - loc) / scale) + f(scale)f(scale)项来自转换中涉及的变量的变化,它的形式对于这个解释无关紧要) .

You can now plug in the first expression into the second, you'll get log_prob(x) = log_prob_std_normal(z) + f(scale) , ie the loc cancelled entirely!您现在可以将第一个表达式插入第二个表达式,您将得到log_prob(x) = log_prob_std_normal(z) + f(scale) ,即loc完全取消! As a result, the gradient of log_prob with respect to loc is 0. .因此, log_prob相对于loc的梯度为0. . This also explains why you don't get a 0. if you evaluate the log probability at a constant: it'll be missing the forward transformation used to create the sample and you'll get some (typically) non-zero gradient.这也解释了为什么您没有得到0.如果您以常数评估对数概率:它将丢失用于创建样本的前向变换,并且您将获得一些(通常)非零梯度。

So, when is this the correct behavior?那么,这什么时候才是正确的行为呢? The reparameterization gradients are correct when you're computing gradients of the distribution parameters with respect to an expectation of a function under that distribution.当您计算分布参数相对于该分布下函数的期望的梯度时,重新参数化梯度是正确的。 One way to compute such an expectation is to do a Monte-Carlo approximation, like so: tf.reduce_mean(g(dist.sample(N), axis=0) . It sounds like that's what you're doing (where your g() is log_prob() ), so it looks like the gradients are correct.计算这种期望的一种方法是进行蒙特卡罗近似,如下所示: tf.reduce_mean(g(dist.sample(N), axis=0) 。听起来这就是你正在做的事情(你的g()log_prob() ),所以看起来梯度是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM