如何在tensorflow 2中获得损失梯度wrt内层输出？

Question

I would like to get gradient of the model's loss function with respect to specific layer's output during training.我想在训练期间获得模型损失函数相对于特定层输出的梯度。 What I want to do with it next, is using a value of that gradient to modify something in layer in the next learning epoch.我接下来要做的是使用该梯度的值在下一个学习时期修改层中的某些内容。 So how to obtain that gradient?那么如何获得那个梯度呢？

Here's a minimal example.这是一个最小的例子。 MinimalRNNCell code is copied from TensorFlow's website and toy data is provided only to reproduce the behavior. MinimalRNNCell 代码从 TensorFlow 的网站复制而来，提供的玩具数据仅用于重现行为。

import tensorflow as tf 
from tensorflow.keras.layers import RNN, SimpleRNNCell, SimpleRNN, Layer, Dense, AbstractRNNCell
from tensorflow.keras import Model
import numpy as np
import tensorflow.keras.backend as K


class MinimalRNNCell(AbstractRNNCell):

    def __init__(self, units, **kwargs):
      self.units = units
      super(MinimalRNNCell, self).__init__(**kwargs)

    @property
    def state_size(self):
      return self.units

    def build(self, input_shape):
      self.kernel = self.add_weight(shape=(input_shape[-1], self.units),
                                    initializer='uniform',
                                    name='kernel')
      self.recurrent_kernel = self.add_weight(
          shape=(self.units, self.units),
          initializer='uniform',
          name='recurrent_kernel')
      self.built = True

    def call(self, inputs, states):
      prev_output = states[0]
      h = K.dot(inputs, self.kernel)
      output = h + K.dot(prev_output, self.recurrent_kernel)
      return output, output


class MyModel(Model):
    def __init__(self, size):
        super(MyModel, self).__init__()
        self.minimalrnn=RNN(MinimalRNNCell(size), name='minimalrnn')
        self.out=Dense(4)

    def call(self, inputs):
        out=self.minimalrnn(inputs)
        out=self.out(out)
        return out


x=np.array([[[3.],[0.],[1.],[2.],[3.]],[[3.],[0.],[1.],[2.],[3.]]])
y=np.array([[[0.],[1.],[2.],[3.]],[[0.],[1.],[2.],[3.]]])

model=MyModel(2)
model.compile(optimizer='sgd', loss='mse')
model.fit(x,y,epochs=10, batch_size=1, validation_split=0.2)

Now I want to get gradient of output of MyModel's minimalrnn layer (after every batch of data).现在我想获得 MyModel 的 minimumrnn 层的输出梯度（在每批数据之后）。

How to do this?这该怎么做？ I suppose I can try with GradientTape watching model.get_layer('minimalrnn').output, but I need more learning resources or examples.我想我可以尝试使用 GradientTape 观看 model.get_layer('minimalrnn').output，但我需要更多的学习资源或示例。

EDIT编辑

I used GradientTape as in code provided by Tiago Martins Peres, but I specifically want to obtain gradient wrt layer output, and I'm still not able to achieve that.我在 Tiago Martins Peres 提供的代码中使用了 GradientTape，但我特别想获得梯度 wrt 层输出，但我仍然无法实现。

Now after class definitions my code looks like this:现在在类定义之后，我的代码如下所示：


x=np.array([[[3.],[0.],[1.],[2.],[3.]],[[3.],[0.],[1.],[2.],[3.]]])
y=np.array([[0., 1., 2., 3.],[0., 1., 2., 3.]])

model=MyModel(2)

#inputs = tf.keras.Input(shape=(2,5,1))
#model.call(x)

def gradients(model, inputs, targets):
    with tf.GradientTape() as tape:
        tape.watch(model.get_layer('minimalrnn').output)
        loss_value = loss_fn(model, inputs, targets)
    return tape.gradient(loss_value, model.trainable_variables)

def loss_fn(model, inputs, targets):
    error = model(inputs) - targets
    return tf.reduce_mean(tf.square(error))

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
print("Initial loss: {:.3f}".format(loss_fn(model, x, y)))
for i in range(10):
    grads = gradients(model, x, y)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    print("Loss at step {:03d}: {:.3f}".format(i, loss_fn(model, x, y)))
print("Final loss: {:.3f}".format(loss_fn(model, x, y)))

As you can see I added tape.watch in gradients function definition, because I want to watch layer output.如您所见，我在梯度函数定义中添加了tape.watch，因为我想观看图层输出。 However I'm getting error:但是我收到错误：

Traceback (most recent call last):
  File "/home/.../test2.py", line 73, in <module>
    grads = gradients(model, x, y)
  File "/home/.../test2.py", line 58, in gradients
    print(model.get_layer('minimalrnn').output)
  File "/home/.../.venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1553, in output
    raise AttributeError('Layer ' + self.name + ' has no inbound nodes.')
AttributeError: Layer minimalrnn has no inbound nodes.

I also tried to call model on Input with specified size (commented lines), according to answer to this: Accessing layer's input/output using Tensorflow 2.0 Model Sub-classing .根据对此的回答，我还尝试在 Input 上调用具有指定大小（注释行）的模型： Accessing layer's input/output using Tensorflow 2.0 Model Sub-classing 。 It didn't help.它没有帮助。 Specifying input shape in model's init function, like below, also doesn't help - still the same error.在模型的 init 函数中指定输入形状，如下所示，也无济于事 - 仍然是同样的错误。

self.minimalrnn=RNN(MinimalRNNCell(size), name='minimalrnn', input_shape=(2,5,1))

Answer 1

Yes you can use GradientTape .是的，您可以使用GradientTape 。 The purpose of tf.GradientTape is to record operations for automatic differentiation or for computing the gradient of an operation or computation with respect to its input variables. tf.GradientTape的目的是记录用于自动微分或计算操作或计算相对于其输入变量的梯度的操作。

According to What's New in TensorFlow 2.0 , to first implement the simple training of a model with tf.GradientTape, call the forward pass on the input tensor inside the tf.GradentTape context manager and then compute the loss function.根据TensorFlow 2.0 的新增功能，首先使用 tf.GradientTape 实现模型的简单训练，在 tf.GradentTape 上下文管理器中调用输入张量的前向传递，然后计算损失函数。 This ensures that all of the computations will be recorded on the gradient tape.这确保所有计算都将记录在梯度磁带上。

Then, compute the gradients with regard to all of the trainable variables in the model.然后，计算模型中所有可训练变量的梯度。 Once the gradients are computed, any desired gradient clipping, normalization, or transformation can be performed before passing them to the optimizer to apply them to the model variables.一旦计算出梯度，就可以在将它们传递给优化器以将它们应用于模型变量之前执行任何所需的梯度裁剪、归一化或转换。 Take a look at the following example:看看下面的例子：

NUM_EXAMPLES = 2000

input_x = tf.random.normal([NUM_EXAMPLES])
noise = tf.random.normal([NUM_EXAMPLES])
input_y = input_x * 5 + 2 + noise

def loss_fn(model, inputs, targets):
  error = model(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def gradients(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss_fn(model, inputs, targets)
  return tape.gradient(loss_value, model.trainable_variables)

model = tf.keras.Sequential(tf.keras.layers.Dense(1))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
print("Initial loss: {:.3f}".format(loss_fn(model, input_x, input_y)))
for i in range(500):
  grads = gradients(model, input_x, input_y)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss_fn(model, input_x, input_y)))
print("Final loss: {:.3f}".format(loss(model, input_x, input_y)))
print("W = {}, B = {}".format(*model.trainable_variables))

Answer 2

Ok, so one answer that I finally found is hidden here: https://stackoverflow.com/a/56567364/4750170 .好的，我最终找到的一个答案隐藏在这里： https : //stackoverflow.com/a/56567364/4750170 。 I can even use subclassed model with this.我什至可以使用子类模型。

Additionally problem with AttributeError is strange, because when I used Sequential instead of subclassing Model, AttributeError magically disappeared, maybe it's connected with this issue https://github.com/tensorflow/tensorflow/issues/34834 ?另外 AttributeError 的问题很奇怪，因为当我使用 Sequential 而不是子类化 Model 时，AttributeError 神奇地消失了，也许它与这个问题有关https://github.com/tensorflow/tensorflow/issues/34834 ？

Still, I'd like to know why I can't just pass the layer's output as a second argument to tape.gradient.不过，我想知道为什么我不能将层的输出作为第二个参数传递给tape.gradient。

如何在tensorflow 2中获得损失梯度wrt内层输出？

问题描述

2 个解决方案

解决方案1
2 2020-03-05 11:31:02

解决方案2
2 已采纳 2020-03-06 16:06:50

如何在tensorflow 2中获得损失梯度wrt内层输出？

问题描述

2 个解决方案

解决方案1 2 2020-03-05 11:31:02

解决方案2 2 已采纳 2020-03-06 16:06:50

解决方案1
2 2020-03-05 11:31:02

解决方案2
2 已采纳 2020-03-06 16:06:50