张量流概率中的重新参数化：tf.GradientTape（）不计算相对于分布均值的梯度

Question

In tensorflow version 2.0.0-beta1 , I am trying to implement a keras layer which has weights sampled from a normal random distribution. 在tensorflow版本2.0.0-beta1 ，我试图实现一个keras层，它具有从正态随机分布中采样的权重。 I would like to have the mean of the distribution as trainable parameter. 我希望将分布的均值作为可训练参数。

Thanks to the "reparametrization trick" already implemented in tensorflow-probability , the calculation of the gradient with respect to the mean of the distribution should be possible in principle, if I am not mistaken. 由于已经在tensorflow-probability实现的“再参数化技巧”，如果我没有弄错的话，原则上应该可以计算相对于分布均值的梯度。

However, when I try to calculate the gradient of the network output with respect to the mean value variable using tf.GradientTape() , the returned gradient is None . 但是，当我尝试使用tf.GradientTape()计算网络输出相对于平均值变量的梯度时，返回的渐变为None 。

I created two minimal examples, one of a layer with deterministic weights and one of a layer with random weights. 我创建了两个最小的例子，一个是具有确定性权重的层，另一个是具有随机权重的层。 The gradients of the deterministic layer's gradients are calculated as expected, but the gradients are None in case of the random layer. 确定性层的梯度的梯度按预期计算，但在随机层的情况下梯度为None 。 There is no error message giving details on why the gradient is None , and I am kind of stuck. 没有错误消息提供有关渐变为None原因的详细信息，而且我有点卡住了。

Minimal example code: 最小的示例代码：

A: Here is the minimal example for the deterministic network: 答：这是确定性网络的最小示例：

import tensorflow as tf; print(tf.__version__)

from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer,Input
from tensorflow.keras.models import Model
from tensorflow.keras.initializers import RandomNormal
import tensorflow_probability as tfp

import numpy as np

# example data
x_data = np.random.rand(99,3).astype(np.float32)

# # A: DETERMINISTIC MODEL

# 1 Define Layer

class deterministic_test_layer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(deterministic_test_layer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight(name='kernel', 
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        super(deterministic_test_layer, self).build(input_shape)

    def call(self, x):
        return K.dot(x, self.kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

# 2 Create model and calculate gradient

x = Input(shape=(3,))
fx = deterministic_test_layer(1)(x)
deterministic_test_model = Model(name='test_deterministic',inputs=[x], outputs=[fx])

print('\n\n\nCalculating gradients for deterministic model: ')

for x_now in np.split(x_data,3):
#     print(x_now.shape)
    with tf.GradientTape() as tape:
        fx_now = deterministic_test_model(x_now)
        grads = tape.gradient(
            fx_now,
            deterministic_test_model.trainable_variables,
        )
        print('\n',grads,'\n')

print(deterministic_test_model.summary())

B: The following example is very similar, but instead of deterministic weights I tried to use randomly sampled weights (randomly sampled at call() time!) for the test layer: B：以下示例非常相似，但我尝试使用随机抽样的权重（在call()时间随机抽样！）而不是确定性权重，用于测试层：

# # B: RANDOM MODEL

# 1 Define Layer

class random_test_layer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(random_test_layer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.mean_W = self.add_weight('mean_W',
                                      initializer=RandomNormal(mean=0.5,stddev=0.1),
                                      trainable=True)

        self.kernel_dist = tfp.distributions.MultivariateNormalDiag(loc=self.mean_W,scale_diag=(1.,))
        super(random_test_layer, self).build(input_shape)

    def call(self, x):
        sampled_kernel = self.kernel_dist.sample(sample_shape=x.shape[1])
        return K.dot(x, sampled_kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

# 2 Create model and calculate gradient

x = Input(shape=(3,))
fx = random_test_layer(1)(x)
random_test_model = Model(name='test_random',inputs=[x], outputs=[fx])

print('\n\n\nCalculating gradients for random model: ')

for x_now in np.split(x_data,3):
#     print(x_now.shape)
    with tf.GradientTape() as tape:
        fx_now = random_test_model(x_now)
        grads = tape.gradient(
            fx_now,
            random_test_model.trainable_variables,
        )
        print('\n',grads,'\n')

print(random_test_model.summary())

Expected/Actual Output: 预期/实际产出：

A: The deterministic network works as expected, and the gradients are calculated. 答：确定性网络按预期工作，并计算梯度。 The output is: 输出是：

2.0.0-beta1



Calculating gradients for deterministic model: 

 [<tf.Tensor: id=26, shape=(3, 1), dtype=float32, numpy=
array([[17.79845  ],
       [15.764006 ],
       [14.4183035]], dtype=float32)>] 


 [<tf.Tensor: id=34, shape=(3, 1), dtype=float32, numpy=
array([[16.22232 ],
       [17.09122 ],
       [16.195663]], dtype=float32)>] 


 [<tf.Tensor: id=42, shape=(3, 1), dtype=float32, numpy=
array([[16.382954],
       [16.074356],
       [17.718027]], dtype=float32)>] 

Model: "test_deterministic"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 3)]               0         
_________________________________________________________________
deterministic_test_layer (de (None, 1)                 3         
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
None

B: However, in case of the similar random network, the gradients are not calculated as expected (using the reparametsization trick). B：然而，在类似的随机网络的情况下，不按预期计算梯度（使用重新组织化技巧）。 Instead, they are None . 相反，它们是None 。 The full output is 完整的输出是

Calculating gradients for random model: 

 [None] 


 [None] 


 [None] 

Model: "test_random"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 3)]               0         
_________________________________________________________________
random_test_layer (random_te (None, 1)                 1         
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None

Can anybody point me at the problem here? 任何人都可以在这里指出我的问题吗？

Answer 1

It seems that tfp.distributions.MultivariateNormalDiag is not differentiable with respect to its input parameters (eg loc ). 看来tfp.distributions.MultivariateNormalDiag在输入参数（例如loc ）方面是不可微分的。 In this particular case, the following would be equivalent: 在这种特殊情况下，以下内容是等效的：

class random_test_layer(Layer):
    ...

    def build(self, input_shape):
        ...
        self.kernel_dist = tfp.distributions.MultivariateNormalDiag(loc=0, scale_diag=(1.,))
        super(random_test_layer, self).build(input_shape)

    def call(self, x):
        sampled_kernel = self.kernel_dist.sample(sample_shape=x.shape[1]) + self.mean_W
        return K.dot(x, sampled_kernel)

In this case, however, the loss is differentiable with respect to self.mean_W . 然而，在这种情况下，损失相对于self.mean_W是可self.mean_W 。

Be careful: Although this approach might work for your purposes, note that calling the density function self.kernel_dist.prob would yield different results, since we took loc outside. 注意：虽然这种方法可能适用于您的目的，但请注意调用密度函数self.kernel_dist.prob会产生不同的结果，因为我们将loc放在外面。

张量流概率中的重新参数化：tf.GradientTape（）不计算相对于分布均值的梯度

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-07-08 22:34:32

张量流概率中的重新参数化：tf.GradientTape（）不计算相对于分布均值的梯度

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-07-08 22:34:32

解决方案1
1 已采纳 2019-07-08 22:34:32