简体   繁体   English

修改TensorFlow中的传入渐变

[英]Modify the incoming gradient in TensorFlow

Consider a neural network with 2 fully connected layers "l1_dense" and "l2_dense" in this order and some loss function. 考虑一个具有2个完全连接的层"l1_dense""l2_dense"的顺序的神经网络,以及一些损失函数。 During backpropagation I want to compute the gradient of "l2_dense" wrt the loss function, do some manipulation of the gradient and use this manipulated gradient as the incoming gradient (in the chain rule) for the "l1_dense" layer. 在反向传播期间,我要通过损失函数计算"l2_dense"的梯度,对梯度进行一些处理,然后将此操纵后的梯度用作"l1_dense"层的传入梯度(在链式规则中)。 I know that I can use tf.train.Optimizer.compute_gradients() in order to compute the gradient wrt "l2_dense" and be able to manipulate it. 我知道我可以使用tf.train.Optimizer.compute_gradients()来计算梯度wrt "l2_dense"并能够对其进行操作。 What I do not know how to do is to feed the modified gradient for the computation of the "l1_dense" gradient. 我不知道该怎么做是为"l1_dense"梯度的计算提供修改后的梯度。

As a very simplistic example, let's say that the way I want to manipulate the "l2_dense" gradient is to divide it by some number k . 作为一个非常简单的示例,假设我要操纵"l2_dense"渐变的方法是将其除以某个数字k I know that all this is equivalent to just diving the loss by k , I am just giving this simple example for the purposes of the question. 我知道所有这一切都等同于将损失除以k ,出于问题的目的,我仅举这个简单的例子。 The code will be sth like: 该代码将是这样的:

import tensorflow as tf

i = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=[None, 1])

x = tf.layers.dense(i, 4, tf.nn.relu, name="l1_dense")
x = tf.layers.dense(x, 1, tf.nn.relu, name="l2_dense")

loss = tf.losses.mean_squared_error(y, x)

opt = tf.train.AdamOptimizer()

gvars = tf.get_default_graph().get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
l1_dense_k = [v for v in gvars if v.name == "l1_dense/kernel:0"][0]
l1_dense_b = [v for v in gvars if v.name == "l1_dense/bias:0"][0]
l2_dense_k = [v for v in gvars if v.name == "l2_dense/kernel:0"][0]
l2_dense_b = [v for v in gvars if v.name == "l2_dense/bias:0"][0]

gvs = opt.compute_gradients(loss, var_list=[l2_dense_k, l2_dense_b])
# Manipulate gradients
gvs = [(g/10, v) for g,v in gvs]

# Compute gradients w.r.t. l1_dense_k and l1_dense_b using gvs ???

To be completely clear, my setting is far more complicated than that and I cannot automatically do the manipulation I need by changing the loss function. 完全清楚地说,我的设置要复杂得多,并且无法通过更改损失函数自动进行所需的操作。 Furthermore, I need a solution in which the gradient wrt each variable is computed only once. 此外,我需要一个解决方案,其中每个变量的梯度仅计算一次。

The answer is actually very simple - you need to use tf.gradients() . 答案实际上非常简单-您需要使用tf.gradients() In case anyone gets stuck on this too, here is the solution: 万一有人也陷入困境,这是解决方案:

import tensorflow as tf

i = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=[None, 1])

x1 = tf.layers.dense(i, 4, tf.nn.relu, name="l1_dense")
x2 = tf.layers.dense(x1, 1, tf.nn.relu, name="l2_dense")

loss = tf.losses.mean_squared_error(y, x2)

gvars = tf.get_default_graph().get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
l1_k = [v for v in gvars if v.name == "l1_dense/kernel:0"][0]
l1_b = [v for v in gvars if v.name == "l1_dense/bias:0"][0]
l2_k = [v for v in gvars if v.name == "l2_dense/kernel:0"][0]
l2_b = [v for v in gvars if v.name == "l2_dense/bias:0"][0]

grads = tf.gradients(loss, [x1, l2_k, l2_b])
x1_grad, l2_k_grad, l2_b_grad = grads

# Manipulate the gradient
x1_grad = x1_grad / 10.0

# Backpropagate the gradient
grads = tf.gradients(x1, [l1_dense_k, l1_dense_b], x1_grad)
l1_k_grad, l1_b_grad = grads

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM