[英]Use tf.gradients or tf.hessians on flattened parameter tensor
Let's say I want to compute the Hessian of a scalar-valued function with respect to some parameters W (eg the weights and biases of a feed-forward neural network). 假设我想计算关于某些参数W(例如前馈神经网络的权重和偏差)的标量值函数的Hessian。 If you consider the following code, implementing a two-dimensional linear model trained to minimize a MSE loss:
如果您考虑以下代码,则实施经过培训以最小化MSE损失的二维线性模型:
import numpy as np
import tensorflow as tf
x = tf.placeholder(dtype=tf.float32, shape=[None, 2]) #inputs
t = tf.placeholder(dtype=tf.float32, shape=[None,]) #labels
W = tf.placeholder(np.eye(2), dtype=tf.float32) #weights
preds = tf.matmul(x, W) #linear model
loss = tf.reduce_mean(tf.square(preds-t), axis=0) #mse loss
params = tf.trainable_variables()
hessian = tf.hessians(loss, params)
you'd expect session.run(tf.hessian,feed_dict={})
to return a 2x2 matrix (equal to W). 你期望
session.run(tf.hessian,feed_dict={})
返回一个2x2矩阵(等于W)。 It turns out that because params
is a 2x2 tensor, the output is rather a tensor with shape [2, 2, 2, 2]. 事实证明,因为
params
是一个2x2张量,输出是一个形状为[2,2,2,2]的张量。 While I can easily reshape the tensor to obtain the matrix I want, it seems that this operation might be extremely cumbersome when params
becomes a list of tensors of varying size (ie when the model is a deep neural network for instance). 虽然我可以很容易地重新形成张量以获得我想要的矩阵,但是当
params
成为不同大小的张量列表时(例如,当模型是深度神经网络时),这个操作似乎非常麻烦。
It seems that are two ways around this: 看来这有两种方式:
Flatten params
to be a 1D tensor called flat_params
: 将
params
平为一个名为flat_params
一维张量:
flat_params = tf.concat([tf.reshape(p, [-1]) for p in params])
so that tf.hessians(loss, flat_params)
naturally returns a 2x2 matrix. 所以
tf.hessians(loss, flat_params)
自然地返回一个2x2矩阵。 However as noted in Why does Tensorflow Reshape tf.reshape() break the flow of gradients? 然而,如为什么Tensorflow重塑tf.reshape()会破坏梯度流? for tf.gradients (but also holds for tf.hessians), tensorflow is not able to see the symbolic link in the graph between
params
and flat_params
and tf.hessians(loss, flat_params)
will raise an error as the gradients will be seen as None
. 对于tf.gradients(但也适用于tf.hessians),tensorflow无法在
params
和flat_params
之间看到图形中的符号链接,并且tf.hessians(loss, flat_params)
将引发错误,因为渐变将被视为None
。
In https://afqueiruga.github.io/tensorflow/2017/12/28/hessian-mnist.html , the author of the code goes the other way, and first create the flat parameter and reshapes its parts into self.params
. 在https://afqueiruga.github.io/tensorflow/2017/12/28/hessian-mnist.html中 ,代码的作者走另一条路,首先创建flat参数并将其部分重新
self.params
为self.params
。 This trick does work and gets you the hessian with its expected shape (2x2 matrix). 这个技巧确实有效,可以获得具有预期形状(2x2矩阵)的粗麻布。 However, it seems to me that this will be cumbersome to use when you have a complex model, and impossible to apply if you create your model via built-in functions (like
tf.layers.dense
, ..). 但是,在我看来,当你有一个复杂的模型时,这将很麻烦,如果你通过内置函数(如
tf.layers.dense
,..)创建模型,则无法应用。
Is there no straight-forward way to get the Hessian matrix (as in the 2x2 matrix in this example) from tf.hessians
, when self.params
is a list of tensor of arbitrary shapes? 当
self.params
是任意形状的张量列表时,是否没有直接的方法从tf.hessians
获得Hessian矩阵(如本例中的2x2矩阵)? If not, how can you automatize the reshaping of the output tensor of tf.hessians
? 如果没有,你如何自动化
tf.hessians
输出张量的tf.hessians
?
It turns out (per TensorFlow r1.13) that if len(xs) > 1, then tf.hessians(ys, xs) returns tensors corresponding to only the block diagonal submatrices of the full Hessian matrix. 事实证明(根据TensorFlow r1.13),如果len(xs)> 1,则tf.hessians(ys,xs)返回仅对应于完整Hessian矩阵的块对角子矩阵的张量。 Full story and solutions in this paper https://arxiv.org/pdf/1905.05559 , and code at https://github.com/gknilsen/pyhessian
本文的全文和解决方案https://arxiv.org/pdf/1905.05559 ,代码为https://github.com/gknilsen/pyhessian
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.