简体   繁体   English

张量流概率中的重新参数化:tf.GradientTape()不计算相对于分布均值的梯度

[英]Reparametrization in tensorflow-probability: tf.GradientTape() doesn't calculate the gradient with respect to a distribution's mean

In tensorflow version 2.0.0-beta1 , I am trying to implement a keras layer which has weights sampled from a normal random distribution. tensorflow版本2.0.0-beta1 ,我试图实现一个keras层,它具有从正态随机分布中采样的权重。 I would like to have the mean of the distribution as trainable parameter. 我希望将分布的均值作为可训练参数。

Thanks to the "reparametrization trick" already implemented in tensorflow-probability , the calculation of the gradient with respect to the mean of the distribution should be possible in principle, if I am not mistaken. 由于已经在tensorflow-probability实现的“再参数化技巧”,如果我没有弄错的话,原则上应该可以计算相对于分布均值的梯度。

However, when I try to calculate the gradient of the network output with respect to the mean value variable using tf.GradientTape() , the returned gradient is None . 但是,当我尝试使用tf.GradientTape()计算网络输出相对于平均值变量的梯度时,返回的渐变为None

I created two minimal examples, one of a layer with deterministic weights and one of a layer with random weights. 我创建了两个最小的例子,一个是具有确定性权重的层,另一个是具有随机权重的层。 The gradients of the deterministic layer's gradients are calculated as expected, but the gradients are None in case of the random layer. 确定性层的梯度的梯度按预期计算,但在随机层的情况下梯度为None There is no error message giving details on why the gradient is None , and I am kind of stuck. 没有错误消息提供有关渐变为None原因的详细信息,而且我有点卡住了。

Minimal example code: 最小的示例代码:

A: Here is the minimal example for the deterministic network: 答:这是确定性网络的最小示例:

import tensorflow as tf; print(tf.__version__)

from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer,Input
from tensorflow.keras.models import Model
from tensorflow.keras.initializers import RandomNormal
import tensorflow_probability as tfp

import numpy as np

# example data
x_data = np.random.rand(99,3).astype(np.float32)

# # A: DETERMINISTIC MODEL

# 1 Define Layer

class deterministic_test_layer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(deterministic_test_layer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight(name='kernel', 
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        super(deterministic_test_layer, self).build(input_shape)

    def call(self, x):
        return K.dot(x, self.kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

# 2 Create model and calculate gradient

x = Input(shape=(3,))
fx = deterministic_test_layer(1)(x)
deterministic_test_model = Model(name='test_deterministic',inputs=[x], outputs=[fx])

print('\n\n\nCalculating gradients for deterministic model: ')

for x_now in np.split(x_data,3):
#     print(x_now.shape)
    with tf.GradientTape() as tape:
        fx_now = deterministic_test_model(x_now)
        grads = tape.gradient(
            fx_now,
            deterministic_test_model.trainable_variables,
        )
        print('\n',grads,'\n')

print(deterministic_test_model.summary())

B: The following example is very similar, but instead of deterministic weights I tried to use randomly sampled weights (randomly sampled at call() time!) for the test layer: B:以下示例非常相似,但我尝试使用随机抽样的权重(在call()时间随机抽样!)而不是确定性权重,用于测试层:

# # B: RANDOM MODEL

# 1 Define Layer

class random_test_layer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(random_test_layer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.mean_W = self.add_weight('mean_W',
                                      initializer=RandomNormal(mean=0.5,stddev=0.1),
                                      trainable=True)

        self.kernel_dist = tfp.distributions.MultivariateNormalDiag(loc=self.mean_W,scale_diag=(1.,))
        super(random_test_layer, self).build(input_shape)

    def call(self, x):
        sampled_kernel = self.kernel_dist.sample(sample_shape=x.shape[1])
        return K.dot(x, sampled_kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

# 2 Create model and calculate gradient

x = Input(shape=(3,))
fx = random_test_layer(1)(x)
random_test_model = Model(name='test_random',inputs=[x], outputs=[fx])

print('\n\n\nCalculating gradients for random model: ')

for x_now in np.split(x_data,3):
#     print(x_now.shape)
    with tf.GradientTape() as tape:
        fx_now = random_test_model(x_now)
        grads = tape.gradient(
            fx_now,
            random_test_model.trainable_variables,
        )
        print('\n',grads,'\n')

print(random_test_model.summary())

Expected/Actual Output: 预期/实际产出:

A: The deterministic network works as expected, and the gradients are calculated. 答:确定性网络按预期工作,并计算梯度。 The output is: 输出是:

2.0.0-beta1



Calculating gradients for deterministic model: 

 [<tf.Tensor: id=26, shape=(3, 1), dtype=float32, numpy=
array([[17.79845  ],
       [15.764006 ],
       [14.4183035]], dtype=float32)>] 


 [<tf.Tensor: id=34, shape=(3, 1), dtype=float32, numpy=
array([[16.22232 ],
       [17.09122 ],
       [16.195663]], dtype=float32)>] 


 [<tf.Tensor: id=42, shape=(3, 1), dtype=float32, numpy=
array([[16.382954],
       [16.074356],
       [17.718027]], dtype=float32)>] 

Model: "test_deterministic"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 3)]               0         
_________________________________________________________________
deterministic_test_layer (de (None, 1)                 3         
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
None

B: However, in case of the similar random network, the gradients are not calculated as expected (using the reparametsization trick). B:然而,在类似的随机网络的情况下,不按预期计算梯度(使用重新组织化技巧)。 Instead, they are None . 相反,它们是None The full output is 完整的输出是

Calculating gradients for random model: 

 [None] 


 [None] 


 [None] 

Model: "test_random"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 3)]               0         
_________________________________________________________________
random_test_layer (random_te (None, 1)                 1         
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None

Can anybody point me at the problem here? 任何人都可以在这里指出我的问题吗?

It seems that tfp.distributions.MultivariateNormalDiag is not differentiable with respect to its input parameters (eg loc ). 看来tfp.distributions.MultivariateNormalDiag在输入参数(例如loc )方面是不可微分的。 In this particular case, the following would be equivalent: 在这种特殊情况下,以下内容是等效的:

class random_test_layer(Layer):
    ...

    def build(self, input_shape):
        ...
        self.kernel_dist = tfp.distributions.MultivariateNormalDiag(loc=0, scale_diag=(1.,))
        super(random_test_layer, self).build(input_shape)

    def call(self, x):
        sampled_kernel = self.kernel_dist.sample(sample_shape=x.shape[1]) + self.mean_W
        return K.dot(x, sampled_kernel)

In this case, however, the loss is differentiable with respect to self.mean_W . 然而,在这种情况下,损失相对于self.mean_W是可self.mean_W

Be careful: Although this approach might work for your purposes, note that calling the density function self.kernel_dist.prob would yield different results, since we took loc outside. 注意:虽然这种方法可能适用于您的目的,但请注意调用密度函数self.kernel_dist.prob会产生不同的结果,因为我们将loc放在外面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM