![](/img/trans.png)
[英]Is there a faster way to compute gradients of output wrt inputs in keras/tensorflow (graph mode)?
[英]In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?
我有一个用 InceptionNet 制作的网络,对于输入样本bx
,我想计算隐藏层的模型输出的梯度。 我有以下代码:
bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
with tf.GradientTape() as gtape:
#gtape.watch(x)
preds = model(bx)
print(preds.shape, end=' ')
class_idx = np.argmax(preds[0])
print(class_idx, end=' ')
class_output = model.output[:, class_idx]
print(class_output, end=' ')
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
#gtape.watch(last_conv_layer)
print(last_conv_layer)
grads = gtape.gradient(class_output, last_conv_layer.output)#[0]
print(grads)
但是,这会给None
。 我也试过gtape.watch(bx)
,但它仍然给出None
。
在尝试 GradientTape 之前,我尝试使用tf.keras.backend.gradient
但这给出了如下错误:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
我的模型如下:
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
inception_v3 (Model) (None, 1000) 23851784
_________________________________________________________________
dense_5 (Dense) (None, 2) 2002
=================================================================
Total params: 23,853,786
Trainable params: 23,819,354
Non-trainable params: 34,432
_________________________________________________________________
任何解决方案表示赞赏。 如果有任何其他方法来计算这些梯度,它不一定是 GradientTape。
我和你有同样的问题。 我不确定这是否是解决问题的最干净的方法,但这是我的解决方案。
我认为问题在于您需要将last_conv_layer.call(...)
的实际返回值作为参数传递给tape.watch()
。 由于在model(bx)
调用的范围内按顺序调用所有层,因此您必须以某种方式将一些代码注入这个内部范围。 我使用以下装饰器做到了这一点:
def watch_layer(layer, tape):
"""
Make an intermediate hidden `layer` watchable by the `tape`.
After calling this function, you can obtain the gradient with
respect to the output of the `layer` by calling:
grads = tape.gradient(..., layer.result)
"""
def decorator(func):
def wrapper(*args, **kwargs):
# Store the result of `layer.call` internally.
layer.result = func(*args, **kwargs)
# From this point onwards, watch this tensor.
tape.watch(layer.result)
# Return the result to continue with the forward pass.
return layer.result
return wrapper
layer.call = decorator(layer.call)
return layer
在您的示例中,我相信以下内容应该适合您:
bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
with tf.GradientTape() as gtape:
# Make the `last_conv_layer` watchable
watch_layer(last_conv_layer, gtape)
preds = model(bx)
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
# Get the gradient w.r.t. the output of `last_conv_layer`
grads = gtape.gradient(class_output, last_conv_layer.result)
print(grads)
您可以使用磁带来计算输出节点的梯度,写入一组可观察对象。 默认情况下,可训练变量可由磁带观察,您可以通过按名称获取特定层的可训练变量并访问trainable_variables
属性来访问它。
例如,在下面的代码中,我计算了预测的梯度,仅针对第一个 FC 层(名称“fc1”)的变量,将任何其他变量视为常量。
import tensorflow as tf
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(10, input_shape=(3,), name="fc1", activation="relu"),
tf.keras.layers.Dense(3, input_shape=(3,), name="fc2"),
]
)
inputs = tf.ones((1, 299, 299, 3))
with tf.GradientTape() as tape:
preds = model(inputs)
grads = tape.gradient(preds, model.get_layer("fc1").trainable_variables)
print(grads)
如果您需要关于所有层的输出的预测梯度,您可以执行以下操作:
(以@nessuno 的回答为基础)
import tensorflow as tf
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(10, input_shape=(3,), name="fc1", activation="relu"),
tf.keras.layers.Dense(3, input_shape=(3,), name="fc2"),
]
)
# build a new model
output_layer = model.outputs
all_layers = [layer.output for layer in model.layers]
grad_model = tf.keras.model(inputs=model.inputs, outputs=all_layers)
inputs = tf.ones((1, 299, 299, 3))
with tf.GradientTape() as tape:
output_of_all_layers = grad_model(inputs)
preds = output_layer[-1] # last layer is output layer
# take gradients of last layer with respect to all layers in the model
grads = tape.gradient(preds, output_of_all_layers)
# note: grads[-1] should be all 1, since it it d(output)/d(output)
print(grads)
计算输出网络相对于特定层的梯度的示例。
def example():
def grad_cam(input_model, image, category_index, layer_name):
gradModel = Model(
inputs=[model.inputs],
outputs=[model.get_layer(layer_name).output,
model.output])
with tf.GradientTape() as tape:
inputs = tf.cast(image, tf.float32)
(convOutputs, predictions) = gradModel(inputs)
loss = predictions[:, category_index]
grads = tape.gradient(loss, convOutputs)
castConvOutputs = tf.cast(convOutputs > 0, "float32")
castGrads = tf.cast(grads > 0, "float32")
guidedGrads = castConvOutputs * castGrads * grads
convOutputs = convOutputs[0]
guidedGrads = guidedGrads[0]
weights = tf.reduce_mean(guidedGrads, axis=(0, 1))
cam = tf.reduce_sum(tf.multiply(weights, convOutputs), axis=-1)
H, W = image.shape[1], image.shape[2]
cam = np.maximum(cam, 0) # ReLU so we only get positive importance
cam = cv2.resize(cam, (W, H), cv2.INTER_NEAREST)
cam = cam / cam.max()
return cam
im = load_image_normalize(im_path, mean, std)
print(im.shape)
cam = grad_cam(model, im, 5, 'conv5_block16_concat') # Mass is class 5
# Loads reference CAM to compare our implementation with.
reference = np.load("reference_cam.npy")
error = np.mean((cam-reference)**2)
print(f"Error from reference: {error:.4f}, should be less than 0.05")
plt.imshow(load_image(im_path, df, preprocess=False), cmap='gray')
plt.title("Original")
plt.axis('off')
plt.show()
plt.imshow(load_image(im_path, df, preprocess=False), cmap='gray')
plt.imshow(cam, cmap='magma', alpha=0.5)
plt.title("GradCAM")
plt.axis('off')
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.