简体   繁体   English

如何在tensorflow 2中获得损失梯度wrt内层输出?

[英]How to get loss gradient wrt internal layer output in tensorflow 2?

I would like to get gradient of the model's loss function with respect to specific layer's output during training.我想在训练期间获得模型损失函数相对于特定层输出的梯度。 What I want to do with it next, is using a value of that gradient to modify something in layer in the next learning epoch.我接下来要做的是使用该梯度的值在下一个学习时期修改层中的某些内容。 So how to obtain that gradient?那么如何获得那个梯度呢?

Here's a minimal example.这是一个最小的例子。 MinimalRNNCell code is copied from TensorFlow's website and toy data is provided only to reproduce the behavior. MinimalRNNCell 代码从 TensorFlow 的网站复制而来,提供的玩具数据仅用于重现行为。

import tensorflow as tf 
from tensorflow.keras.layers import RNN, SimpleRNNCell, SimpleRNN, Layer, Dense, AbstractRNNCell
from tensorflow.keras import Model
import numpy as np
import tensorflow.keras.backend as K


class MinimalRNNCell(AbstractRNNCell):

    def __init__(self, units, **kwargs):
      self.units = units
      super(MinimalRNNCell, self).__init__(**kwargs)

    @property
    def state_size(self):
      return self.units

    def build(self, input_shape):
      self.kernel = self.add_weight(shape=(input_shape[-1], self.units),
                                    initializer='uniform',
                                    name='kernel')
      self.recurrent_kernel = self.add_weight(
          shape=(self.units, self.units),
          initializer='uniform',
          name='recurrent_kernel')
      self.built = True

    def call(self, inputs, states):
      prev_output = states[0]
      h = K.dot(inputs, self.kernel)
      output = h + K.dot(prev_output, self.recurrent_kernel)
      return output, output


class MyModel(Model):
    def __init__(self, size):
        super(MyModel, self).__init__()
        self.minimalrnn=RNN(MinimalRNNCell(size), name='minimalrnn')
        self.out=Dense(4)

    def call(self, inputs):
        out=self.minimalrnn(inputs)
        out=self.out(out)
        return out


x=np.array([[[3.],[0.],[1.],[2.],[3.]],[[3.],[0.],[1.],[2.],[3.]]])
y=np.array([[[0.],[1.],[2.],[3.]],[[0.],[1.],[2.],[3.]]])

model=MyModel(2)
model.compile(optimizer='sgd', loss='mse')
model.fit(x,y,epochs=10, batch_size=1, validation_split=0.2)



Now I want to get gradient of output of MyModel's minimalrnn layer (after every batch of data).现在我想获得 MyModel 的 minimumrnn 层的输出梯度(在每批数据之后)。

How to do this?这该怎么做? I suppose I can try with GradientTape watching model.get_layer('minimalrnn').output, but I need more learning resources or examples.我想我可以尝试使用 GradientTape 观看 model.get_layer('minimalrnn').output,但我需要更多的学习资源或示例。

EDIT编辑

I used GradientTape as in code provided by Tiago Martins Peres, but I specifically want to obtain gradient wrt layer output, and I'm still not able to achieve that.我在 Tiago Martins Peres 提供的代码中使用了 GradientTape,但我特别想获得梯度 wrt 层输出,但我仍然无法实现。

Now after class definitions my code looks like this:现在在类定义之后,我的代码如下所示:


x=np.array([[[3.],[0.],[1.],[2.],[3.]],[[3.],[0.],[1.],[2.],[3.]]])
y=np.array([[0., 1., 2., 3.],[0., 1., 2., 3.]])

model=MyModel(2)

#inputs = tf.keras.Input(shape=(2,5,1))
#model.call(x)

def gradients(model, inputs, targets):
    with tf.GradientTape() as tape:
        tape.watch(model.get_layer('minimalrnn').output)
        loss_value = loss_fn(model, inputs, targets)
    return tape.gradient(loss_value, model.trainable_variables)

def loss_fn(model, inputs, targets):
    error = model(inputs) - targets
    return tf.reduce_mean(tf.square(error))

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
print("Initial loss: {:.3f}".format(loss_fn(model, x, y)))
for i in range(10):
    grads = gradients(model, x, y)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    print("Loss at step {:03d}: {:.3f}".format(i, loss_fn(model, x, y)))
print("Final loss: {:.3f}".format(loss_fn(model, x, y)))

As you can see I added tape.watch in gradients function definition, because I want to watch layer output.如您所见,我在梯度函数定义中添加了tape.watch,因为我想观看图层输出。 However I'm getting error:但是我收到错误:

Traceback (most recent call last):
  File "/home/.../test2.py", line 73, in <module>
    grads = gradients(model, x, y)
  File "/home/.../test2.py", line 58, in gradients
    print(model.get_layer('minimalrnn').output)
  File "/home/.../.venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1553, in output
    raise AttributeError('Layer ' + self.name + ' has no inbound nodes.')
AttributeError: Layer minimalrnn has no inbound nodes.

I also tried to call model on Input with specified size (commented lines), according to answer to this: Accessing layer's input/output using Tensorflow 2.0 Model Sub-classing .根据对此的回答,我还尝试在 Input 上调用具有指定大小(注释行)的模型: Accessing layer's input/output using Tensorflow 2.0 Model Sub-classing It didn't help.它没有帮助。 Specifying input shape in model's init function, like below, also doesn't help - still the same error.在模型的 init 函数中指定输入形状,如下所示,也无济于事 - 仍然是同样的错误。

self.minimalrnn=RNN(MinimalRNNCell(size), name='minimalrnn', input_shape=(2,5,1))

Yes you can use GradientTape .是的,您可以使用GradientTape The purpose of tf.GradientTape is to record operations for automatic differentiation or for computing the gradient of an operation or computation with respect to its input variables. tf.GradientTape的目的是记录用于自动微分或计算操作或计算相对于其输入变量的梯度的操作。

According to What's New in TensorFlow 2.0 , to first implement the simple training of a model with tf.GradientTape, call the forward pass on the input tensor inside the tf.GradentTape context manager and then compute the loss function.根据TensorFlow 2.0 的新增功能,首先使用 tf.GradientTape 实现模型的简单训练,在 tf.GradentTape 上下文管理器中调用输入张量的前向传递,然后计算损失函数。 This ensures that all of the computations will be recorded on the gradient tape.这确保所有计算都将记录在梯度磁带上。

Then, compute the gradients with regard to all of the trainable variables in the model.然后,计算模型中所有可训练变量的梯度。 Once the gradients are computed, any desired gradient clipping, normalization, or transformation can be performed before passing them to the optimizer to apply them to the model variables.一旦计算出梯度,就可以在将它们传递给优化器以将它们应用于模型变量之前执行任何所需的梯度裁剪、归一化或转换。 Take a look at the following example:看看下面的例子:

NUM_EXAMPLES = 2000

input_x = tf.random.normal([NUM_EXAMPLES])
noise = tf.random.normal([NUM_EXAMPLES])
input_y = input_x * 5 + 2 + noise

def loss_fn(model, inputs, targets):
  error = model(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def gradients(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss_fn(model, inputs, targets)
  return tape.gradient(loss_value, model.trainable_variables)

model = tf.keras.Sequential(tf.keras.layers.Dense(1))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
print("Initial loss: {:.3f}".format(loss_fn(model, input_x, input_y)))
for i in range(500):
  grads = gradients(model, input_x, input_y)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss_fn(model, input_x, input_y)))
print("Final loss: {:.3f}".format(loss(model, input_x, input_y)))
print("W = {}, B = {}".format(*model.trainable_variables))

Ok, so one answer that I finally found is hidden here: https://stackoverflow.com/a/56567364/4750170 .好的,我最终找到的一个答案隐藏在这里: https : //stackoverflow.com/a/56567364/4750170 I can even use subclassed model with this.我什至可以使用子类模型。

Additionally problem with AttributeError is strange, because when I used Sequential instead of subclassing Model, AttributeError magically disappeared, maybe it's connected with this issue https://github.com/tensorflow/tensorflow/issues/34834 ?另外 AttributeError 的问题很奇怪,因为当我使用 Sequential 而不是子类化 Model 时,AttributeError 神奇地消失了,也许它与这个问题有关https://github.com/tensorflow/tensorflow/issues/34834

Still, I'd like to know why I can't just pass the layer's output as a second argument to tape.gradient.不过,我想知道为什么我不能将层的输出作为第二个参数传递给tape.gradient。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow无法通过变量获得渐变,但可以使用Tensor - Tensorflow cannot get gradient wrt a Variable, but can wrt a Tensor 在具有急切执行的 TensorFlow 2.0 中,如何计算特定层的网络输出的梯度? - In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer? Tensorflow,Keras:如何使用停止渐变在Keras图层中设置add_loss? - Tensorflow, Keras: How to set add_loss in Keras layer with stop gradient? 如何在Tensorflow中获得LSTM的密集层输出? - How to get dense layer output of LSTM in Tensorflow? Tensorflow-如何使用策略梯度计算损失 - Tensorflow - How to compute loss with policy gradient 如果模型 output 层有多个神经元并且只有一个值要预测,那么 Tensorflow model 如何计算损失? - How does a Tensorflow model calculate loss, if the models output layer has multiple neurons and there is only one value to predict? 如何在Tensorflow中获得损失wrt模型预测的梯度? - How can I get the gradient of the loss w.r.t. model prediction in Tensorflow? Keras梯度WRT输入可用于多个输出尺寸 - Keras gradient wrt input for multiple output dimensions 如何获得经过训练的CNN模型的某一层的输出[Tensorflow] - How to get the output of certain layer of trained CNN model [Tensorflow] Tensorflow/Keras:如何通过通道获取输入层的输出? - Tensorflow/Keras: how to get the output of an Input layer by channels?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM