[英]Modify the incoming gradient in TensorFlow
Consider a neural network with 2 fully connected layers "l1_dense"
and "l2_dense"
in this order and some loss function. 考虑一个具有2个完全连接的层
"l1_dense"
和"l2_dense"
的顺序的神经网络,以及一些损失函数。 During backpropagation I want to compute the gradient of "l2_dense"
wrt the loss function, do some manipulation of the gradient and use this manipulated gradient as the incoming gradient (in the chain rule) for the "l1_dense"
layer. 在反向传播期间,我要通过损失函数计算
"l2_dense"
的梯度,对梯度进行一些处理,然后将此操纵后的梯度用作"l1_dense"
层的传入梯度(在链式规则中)。 I know that I can use tf.train.Optimizer.compute_gradients()
in order to compute the gradient wrt "l2_dense"
and be able to manipulate it. 我知道我可以使用
tf.train.Optimizer.compute_gradients()
来计算梯度wrt "l2_dense"
并能够对其进行操作。 What I do not know how to do is to feed the modified gradient for the computation of the "l1_dense"
gradient. 我不知道该怎么做是为
"l1_dense"
梯度的计算提供修改后的梯度。
As a very simplistic example, let's say that the way I want to manipulate the "l2_dense"
gradient is to divide it by some number k
. 作为一个非常简单的示例,假设我要操纵
"l2_dense"
渐变的方法是将其除以某个数字k
。 I know that all this is equivalent to just diving the loss by k
, I am just giving this simple example for the purposes of the question. 我知道所有这一切都等同于将损失除以
k
,出于问题的目的,我仅举这个简单的例子。 The code will be sth like: 该代码将是这样的:
import tensorflow as tf
i = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=[None, 1])
x = tf.layers.dense(i, 4, tf.nn.relu, name="l1_dense")
x = tf.layers.dense(x, 1, tf.nn.relu, name="l2_dense")
loss = tf.losses.mean_squared_error(y, x)
opt = tf.train.AdamOptimizer()
gvars = tf.get_default_graph().get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
l1_dense_k = [v for v in gvars if v.name == "l1_dense/kernel:0"][0]
l1_dense_b = [v for v in gvars if v.name == "l1_dense/bias:0"][0]
l2_dense_k = [v for v in gvars if v.name == "l2_dense/kernel:0"][0]
l2_dense_b = [v for v in gvars if v.name == "l2_dense/bias:0"][0]
gvs = opt.compute_gradients(loss, var_list=[l2_dense_k, l2_dense_b])
# Manipulate gradients
gvs = [(g/10, v) for g,v in gvs]
# Compute gradients w.r.t. l1_dense_k and l1_dense_b using gvs ???
To be completely clear, my setting is far more complicated than that and I cannot automatically do the manipulation I need by changing the loss function. 完全清楚地说,我的设置要复杂得多,并且无法通过更改损失函数自动进行所需的操作。 Furthermore, I need a solution in which the gradient wrt each variable is computed only once.
此外,我需要一个解决方案,其中每个变量的梯度仅计算一次。
The answer is actually very simple - you need to use tf.gradients()
. 答案实际上非常简单-您需要使用
tf.gradients()
。 In case anyone gets stuck on this too, here is the solution: 万一有人也陷入困境,这是解决方案:
import tensorflow as tf
i = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=[None, 1])
x1 = tf.layers.dense(i, 4, tf.nn.relu, name="l1_dense")
x2 = tf.layers.dense(x1, 1, tf.nn.relu, name="l2_dense")
loss = tf.losses.mean_squared_error(y, x2)
gvars = tf.get_default_graph().get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
l1_k = [v for v in gvars if v.name == "l1_dense/kernel:0"][0]
l1_b = [v for v in gvars if v.name == "l1_dense/bias:0"][0]
l2_k = [v for v in gvars if v.name == "l2_dense/kernel:0"][0]
l2_b = [v for v in gvars if v.name == "l2_dense/bias:0"][0]
grads = tf.gradients(loss, [x1, l2_k, l2_b])
x1_grad, l2_k_grad, l2_b_grad = grads
# Manipulate the gradient
x1_grad = x1_grad / 10.0
# Backpropagate the gradient
grads = tf.gradients(x1, [l1_dense_k, l1_dense_b], x1_grad)
l1_k_grad, l1_b_grad = grads
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.