在 TensorFlow2 中使用 GradientTape() 计算偏导数的问题

Question

i have problems in the computation of gradients using automatic differentiation in TensorFlow.我在使用 TensorFlow 中的自动微分计算梯度时遇到问题。 Basically i want to create a neural network which has just one output-value f and get an input of two values (x,t).基本上我想创建一个只有一个输出值 f 并获得两个值 (x,t) 的输入的神经网络。 The network should act like a mathematical function, so in this case f(x,t) where x and t are the input-variables and i want to compute partial derivatives, for example df_dx, d2f/dx2 or df_dt .网络应该像数学 function 一样，所以在这种情况下 f(x,t) 其中 x 和 t 是输入变量，我想计算偏导数，例如df_dx, d2f/dx2或df_dt 。 I need those partial derivatives later for a specific loss-function.我稍后需要这些偏导数来获得特定的损失函数。 Here is my simplified code:这是我的简化代码：

import numpy as np
import tensorflow as tf 
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model


class MyModel(Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.flatten = Flatten(input_shape=(2, 1))
        self.d1 = Dense(28)
        self.f = Dense(1)

    def call(self, y):
        y = self.flatten(y)
        y = self.d1(y)
        y = self.f(y)
        return y

if __name__ == "__main__":

    #inp contains the input-variables (x,t)
    inp = np.random.rand(1,2,1)
    inp_tf = tf.convert_to_tensor(inp, np.float32)   

    #Create a Model
    model = MyModel()

    #Here comes the important part:
    x = inp_tf[0][0]
    t = inp_tf[0][1]

    with tf.GradientTape(persistent=True) as tape:
        tape.watch(inp_tf[0][0])
        tape.watch(inp_tf)
        f = model(inp_tf)

    df_dx = tape.gradient(f, inp_tf[0][0])  #Derivative df_dx
    grad_f = tape.gradient(f, inp_tf)

    tf.print(f)         #--> [[-0.0968768075]]
    tf.print(df_dx)     #--> None
    tf.print(grad_f)    #--> [[[0.284864038]
                        #      [-0.243642956]]]

What i expected was that i get df_dx = [0.284864038] (the first component of grad_f), but it results in None .我所期望的是我得到df_dx = [0.284864038] （grad_f 的第一个组件），但它导致None 。 My questions are:我的问题是：

Is it possible to get partial derivatives of f to only one input-variable?是否可以将 f 的偏导数仅计算为一个输入变量？
If yes: What i have to change in my code that the computation df_dx doesn't result None ?如果是：我必须在我的代码中更改计算 df_dx 不会导致None ？

What i think could do is to modify the architecture of the class MyModel that i use two different Inputlayer (one for x and one for t) so that i can call the model like f = model(x,t) but that seems unnatural for me and i think there should be an easier way.我认为可以做的是修改class MyModel的体系结构，我使用两个不同的 Inputlayer（一个用于 x，一个用于 t），以便我可以像f = model(x,t)一样调用 model 但这似乎不自然我和我认为应该有一个更简单的方法。

Another point is that i don't get an Error when i change the input_shape of the Flattenlayer for example to self.flatten = Flatten(input_shape=(5,1) but my inputvector has shape(1,2,1), so i expect to get an error but that's not the case, why? I'm grateful for your help:)另一点是，当我将 Flattenlayer 的 input_shape 更改为self.flatten = Flatten(input_shape=(5,1)但我的 inputvector 具有 shape(1,2,1) 时，我没有收到错误，所以我期望得到一个错误，但事实并非如此，为什么？我很感激你的帮助:)

I use the following configurations:我使用以下配置：

Visual Studio Code with Python-Extension as IDE带有 Python 扩展的 Visual Studio 代码为 IDE
Python-Version: 3.7.6 Python 版本：3.7.6
TensorFlow-Version: 2.1.0 TensorFlow 版本：2.1.0
Keras-Version: 2.2.4-tf Keras 版本：2.2.4-tf

Answer 1

Each time you do inp_tf[0][0] or inp_tf[0][1] you are creating a new tensor, but that new tensor is not used as input to your model, inp_tf is.每次执行inp_tf[0][0]或inp_tf[0][1]时，您都在创建一个新张量，但该新张量不用作 model 的输入， inp_tf是。 Even if inp_tf[0][0] if part of inp_tf , from the point of view of TensorFlow there is no computation graph between your newly created inp_tf[0][0] and f , hence there is no gradient.即使inp_tf[0][0]是inp_tf的一部分，从 TensorFlow 的角度来看，您新创建的inp_tf[0][0]和f之间也没有计算图，因此没有梯度。 You have to compute the gradient with respect to inp_tf and then take the parts of the gradient that you want from there.你必须计算关于inp_tf的梯度，然后从那里获取你想要的梯度部分。

In addition to that, as shown in the documentation of tf.GradientTape , you can use nested tapes to compute second order derivatives.除此之外，如tf.GradientTape的文档中所示，您可以使用嵌套磁带来计算二阶导数。 And, if you use the jacobian , you can avoid using persistent=True , which is better for performance.而且，如果您使用jacobian ，则可以避免使用persistent=True ，这对性能更好。 Here is how it could work in your example (I changed the layer activation functions to sigmoid , as the default linear activation would not have a second order derivative).这是它在您的示例中的工作方式（我将层激活函数更改为sigmoid ，因为默认的线性激活没有二阶导数）。

import numpy as np
import tensorflow as tf 
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model

class MyModel(Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.flatten = Flatten(input_shape=(2, 1))
        self.d1 = Dense(28, activation='sigmoid')
        self.f = Dense(1, activation='sigmoid')

    def call(self, y):
        y = self.flatten(y)
        y = self.d1(y)
        y = self.f(y)
        return y

np.random.seed(0)
inp = np.random.rand(1, 2, 1)
inp_tf = tf.convert_to_tensor(inp, np.float32)
model = MyModel()
with tf.GradientTape() as tape:
    tape.watch(inp_tf)
    with tf.GradientTape() as tape2:
        tape2.watch(inp_tf)
        f = model(inp_tf)
    grad_f = tape2.gradient(f, inp_tf)
    df_dx = grad_f[0, 0]
    df_dt = grad_f[0, 1]
j = tape.jacobian(grad_f, inp_tf)
d2f_dx2 = j[0, 0, :, 0, 0]
d2f_dyx = j[0, 0, :, 0, 1]
d2f_dy2 = j[0, 1, :, 0, 1]
d2f_dxy = j[0, 1, :, 0, 0]

tf.print(df_dx)
# [0.0104712956]
tf.print(df_dt)
# [-0.00301733566]
tf.print(d2f_dx2)
# [[-0.000243180315]]
tf.print(d2f_dyx)
# [[-0.000740956515]]
tf.print(d2f_dy2)
# [[1.49392872e-05]]
tf.print(d2f_dxy)
# [[-0.000740956573]]

在 TensorFlow2 中使用 GradientTape() 计算偏导数的问题

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-04-24 14:45:14

在 TensorFlow2 中使用 GradientTape() 计算偏导数的问题

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-04-24 14:45:14

解决方案1
2 已采纳 2020-04-24 14:45:14