[英]Extract the value of a Tensorflow/Keras GradientTape gradient of a variable
In short : I have a custom loss layer in Tensorflow/Keras 2+, which implements a loss function involving two variables, which also go through minimization.简而言之:我在 Tensorflow/Keras 2+ 中有一个自定义损失层,它通过最小化实现了涉及两个变量的损失 function,这也是 go。 And it works, as can be seen below.它有效,如下所示。 I wish to track the loss gradients with respect to these two variables.我希望跟踪这两个变量的损失梯度。 Using GradientTape.gradient()
seems to work judging from tf.print()
output.从tf.print()
output 来看,使用GradientTape.gradient()
似乎有效。 But I have no idea how to keep the actual values.但我不知道如何保持实际值。
In detail :详细:
Suppose this is my custom loss layer (yes, the loss function is silly, everything is over-simplified for reproducibility):假设这是我的自定义损失层(是的,损失 function 是愚蠢的,为了重现性,一切都被过度简化了):
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model
class MyLoss(Layer):
def __init__(self, var1, var2):
super(MyLoss, self).__init__()
self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
self.var2 = K.variable(var2)
def get_vars(self):
return self.var1, self.var2
def get_gradients(self):
return self.grads
def custom_loss(self, y_true, y_pred):
loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
return loss
def compute_gradients(self, y_true, y_pred):
with tf.GradientTape() as g:
loss = self.custom_loss(y_true, y_pred)
return loss, g.gradient(loss, [self.var1, self.var2])
def call(self, y_true, y_pred):
loss, grads = self.compute_gradients(y_true, y_pred)
self.grads = grads
# tf.print(grads)
self.add_loss(loss)
return y_pred
Suppose these are my data and Model
(yes, y
enters the model as an additional input, this works and isn't related):假设这些是我的数据和Model
(是的, y
输入 model 作为附加输入,这有效且不相关):
n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)
model.compile(optimizer= 'adam')
Now the model and loss work, as evident by the variables profile, eg by keeping the variables after each epoch (their values also make sense if you check the silly loss):现在 model 和损失工作,如变量配置文件所示,例如通过在每个时期后保留变量(如果您检查愚蠢的损失,它们的值也有意义):
var1_list = []
var2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, y_train], None,
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
var1, var2 = model.layers[-1].get_vars()
var1_list.append(var1.numpy())
var2_list.append(var2.numpy())
plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()
But when I wish to observe/keep the gradient I get a list of (empty?) Tensors:但是当我希望观察/保持梯度时,我会得到一个(空的?)张量列表:
grads = model.layers[-1].get_gradients()
grads
ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])
No point in calling numpy()
over these of course:当然,在这些上调用numpy()
毫无意义:
grads[0].numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'
However.然而。 Something is obviously right here, since when I use tf.print(grads)
to print the gradients while training (uncomment the tf.print(grads)
inside the call()
function above), the gradients values are printed and they also make sense:这里很明显,因为当我在训练时使用tf.print(grads)
打印梯度时(取消注释上面call()
function 中的tf.print(grads)
(grads) ),梯度值被打印并且它们也有意义:
[226.651245, 1] [293.38916, 0.998] [263.979889, 0.996000171] [240.448029, 0.994000435] [337.309021, 0.992001] [286.644775, 0.990001857] [194.823975, 0.988003075] [173.756546, 0.98600477] [267.330505, 0.984007] [139.302826, 0.982009768] [310.315216, 0.980013192] [263.746216, 0.97801733] [267.713, 0.976022303] [291.754578, 0.974028111] [376.523895, 0.972034812] [474.974884, 0.970042467] [375.520294, 0.968051136] etc. etc.
Note there is no need to add g.watch([self.var1, self.var2])
, though adding it doesn't change the issue.请注意,无需添加g.watch([self.var1, self.var2])
,尽管添加它不会改变问题。
How do I keep track of those gradients (like I keep track of var1
and var2
)?我如何跟踪这些渐变(比如我跟踪var1
和var2
)? What does tf.print()
"see" that I can't see? tf.print()
“看到”了我看不到的什么?
Following this answer, it seems once you go manual like I did TF might turn off eager execution.按照这个答案,一旦你像我一样使用 go 手册,TF 可能会关闭急切执行。 The solution is to add run_eagerly=True
in the model.compile()
line above:解决方案是在上面的model.compile()
行中添加run_eagerly=True
:
model.compile(optimizer= 'adam', run_eagerly=True)
Then I'm able to call .numpy()
on my grads
tensors with no problem, eg:然后我可以毫无问题地在我的grads
张量上调用.numpy()
,例如:
grad1_list = []
grad2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, y_train], None,
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
grad1, grad2 = model.layers[-1].get_gradients()
grad1_list.append(grad1.numpy())
grad2_list.append(grad2.numpy())
plt.plot(grad1_list, label='grad1')
plt.plot(grad2_list, 'r', label='grad2')
plt.legend()
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.