简体   繁体   English

提取变量的 Tensorflow/Keras GradientTape 梯度值

[英]Extract the value of a Tensorflow/Keras GradientTape gradient of a variable

In short : I have a custom loss layer in Tensorflow/Keras 2+, which implements a loss function involving two variables, which also go through minimization.简而言之:我在 Tensorflow/Keras 2+ 中有一个自定义损失层,它通过最小化实现了涉及两个变量的损失 function,这也是 go。 And it works, as can be seen below.它有效,如下所示。 I wish to track the loss gradients with respect to these two variables.我希望跟踪这两个变量的损失梯度。 Using GradientTape.gradient() seems to work judging from tf.print() output.tf.print() output 来看,使用GradientTape.gradient()似乎有效。 But I have no idea how to keep the actual values.但我不知道如何保持实际值。

In detail :详细

Suppose this is my custom loss layer (yes, the loss function is silly, everything is over-simplified for reproducibility):假设这是我的自定义损失层(是的,损失 function 是愚蠢的,为了重现性,一切都被过度简化了):

import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model

class MyLoss(Layer):
    def __init__(self, var1, var2):
        super(MyLoss, self).__init__()
        self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
        self.var2 = K.variable(var2)
    
    def get_vars(self):
        return self.var1, self.var2
    
    def get_gradients(self):
        return self.grads

    def custom_loss(self, y_true, y_pred):
        loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
        return loss

    def compute_gradients(self, y_true, y_pred):
        with tf.GradientTape() as g:
          loss = self.custom_loss(y_true, y_pred)
          return loss, g.gradient(loss, [self.var1, self.var2])
    
    def call(self, y_true, y_pred):
        loss, grads = self.compute_gradients(y_true, y_pred)
        self.grads = grads
        # tf.print(grads)
        self.add_loss(loss)
        return y_pred

Suppose these are my data and Model (yes, y enters the model as an additional input, this works and isn't related):假设这些是我的数据和Model (是的, y输入 model 作为附加输入,这有效且不相关):

n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)

model.compile(optimizer= 'adam')

Now the model and loss work, as evident by the variables profile, eg by keeping the variables after each epoch (their values also make sense if you check the silly loss):现在 model 和损失工作,如变量配置文件所示,例如通过在每个时期后保留变量(如果您检查愚蠢的损失,它们的值也有意义):

var1_list = []
var2_list = []
for i in range(100):
    if i % 10 == 0:
        print('step %d' % i)
    model.fit([X_train, y_train], None,
              batch_size=32, epochs=1, validation_split=0.1, verbose=0)
    var1, var2 = model.layers[-1].get_vars()
    var1_list.append(var1.numpy())
    var2_list.append(var2.numpy())

plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()

在此处输入图像描述

But when I wish to observe/keep the gradient I get a list of (empty?) Tensors:但是当我希望观察/保持梯度时,我会得到一个(空的?)张量列表:

grads = model.layers[-1].get_gradients()
grads

ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])

No point in calling numpy() over these of course:当然,在这些上调用numpy()毫无意义:

grads[0].numpy()

AttributeError: 'Tensor' object has no attribute 'numpy'

However.然而。 Something is obviously right here, since when I use tf.print(grads) to print the gradients while training (uncomment the tf.print(grads) inside the call() function above), the gradients values are printed and they also make sense:这里很明显,因为当我在训练时使用tf.print(grads)打印梯度时(取消注释上面call() function 中的tf.print(grads) (grads) ),梯度值被打印并且它们也有意义:

 [226.651245, 1] [293.38916, 0.998] [263.979889, 0.996000171] [240.448029, 0.994000435] [337.309021, 0.992001] [286.644775, 0.990001857] [194.823975, 0.988003075] [173.756546, 0.98600477] [267.330505, 0.984007] [139.302826, 0.982009768] [310.315216, 0.980013192] [263.746216, 0.97801733] [267.713, 0.976022303] [291.754578, 0.974028111] [376.523895, 0.972034812] [474.974884, 0.970042467] [375.520294, 0.968051136] etc. etc.

Note there is no need to add g.watch([self.var1, self.var2]) , though adding it doesn't change the issue.请注意,无需添加g.watch([self.var1, self.var2]) ,尽管添加它不会改变问题。

How do I keep track of those gradients (like I keep track of var1 and var2 )?我如何跟踪这些渐变(比如我跟踪var1var2 )? What does tf.print() "see" that I can't see? tf.print() “看到”了我看不到的什么?

Following this answer, it seems once you go manual like I did TF might turn off eager execution.按照这个答案,一旦你像我一样使用 go 手册,TF 可能会关闭急切执行。 The solution is to add run_eagerly=True in the model.compile() line above:解决方案是在上面的model.compile()行中添加run_eagerly=True

model.compile(optimizer= 'adam', run_eagerly=True)

Then I'm able to call .numpy() on my grads tensors with no problem, eg:然后我可以毫无问题地在我的grads张量上调用.numpy() ,例如:

grad1_list = []
grad2_list = []
for i in range(100):
    if i % 10 == 0:
        print('step %d' % i)
    model.fit([X_train, y_train], None,
              batch_size=32, epochs=1, validation_split=0.1, verbose=0)
    grad1, grad2 = model.layers[-1].get_gradients()
    grad1_list.append(grad1.numpy())
    grad2_list.append(grad2.numpy())

plt.plot(grad1_list, label='grad1')
plt.plot(grad2_list, 'r', label='grad2')
plt.legend()
plt.show()

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 tf.keras GradientTape:获取关于输入的梯度 - tf.keras GradientTape: get gradient with respect to input 使用 GradientTape 时,Tensorflow 渐变总是给出 None - Tensorflow gradient always gives None when using GradientTape GradientTape().gradient() 在观察肌腱时返回 None - Tensorflow 2 - GradientTape().gradient() returns None with tendons being watched - Tensorflow 2 使用 tf.GradientTape() wrt 输入的梯度是 None (Tensorflow 2.4) - gradient using tf.GradientTape() wrt inputs is None (Tensorflow 2.4) tensorflow 2.0:tf.GradientTape()。gradient()返回None - tensorflow 2.0: tf.GradientTape().gradient() returns None GradientTape:获取nan的渐变 - GradientTape: getting gradient of nan Tensorflow Keras 对于一个 model 的可训练变量受其他 Z49DDB8F35E630FCC3 的可训练变量影响,渐变磁带返回 None - Tensorflow Keras Gradient Tape returns None for a trainable variable of one model which is impacted by trainable variable of other model GradientTape 中的简单 Keras 网络:LookupError:没有为操作“IteratorGetNext”定义梯度(操作类型:IteratorGetNext) - Simple Keras Network in GradientTape: LookupError: No gradient defined for operation 'IteratorGetNext' (op type: IteratorGetNext) 带有 Keras 的 GradientTape 返回 0 - GradientTape with Keras returns 0 张量流概率中的重新参数化:tf.GradientTape()不计算相对于分布均值的梯度 - Reparametrization in tensorflow-probability: tf.GradientTape() doesn't calculate the gradient with respect to a distribution's mean
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM