简体   繁体   English

在展平参数张量上使用tf.gradients或tf.hessians

[英]Use tf.gradients or tf.hessians on flattened parameter tensor

Let's say I want to compute the Hessian of a scalar-valued function with respect to some parameters W (eg the weights and biases of a feed-forward neural network). 假设我想计算关于某些参数W(例如前馈神经网络的权重和偏差)的标量值函数的Hessian。 If you consider the following code, implementing a two-dimensional linear model trained to minimize a MSE loss: 如果您考虑以下代码,则实施经过培训以最小化MSE损失的二维线性模型:

import numpy as np
import tensorflow as tf

x = tf.placeholder(dtype=tf.float32, shape=[None, 2])  #inputs
t = tf.placeholder(dtype=tf.float32, shape=[None,])  #labels
W = tf.placeholder(np.eye(2), dtype=tf.float32)  #weights

preds = tf.matmul(x, W)  #linear model
loss = tf.reduce_mean(tf.square(preds-t), axis=0) #mse loss

params = tf.trainable_variables() 
hessian = tf.hessians(loss, params)

you'd expect session.run(tf.hessian,feed_dict={}) to return a 2x2 matrix (equal to W). 你期望session.run(tf.hessian,feed_dict={})返回一个2x2矩阵(等于W)。 It turns out that because params is a 2x2 tensor, the output is rather a tensor with shape [2, 2, 2, 2]. 事实证明,因为params是一个2x2张量,输出是一个形状为[2,2,2,2]的张量。 While I can easily reshape the tensor to obtain the matrix I want, it seems that this operation might be extremely cumbersome when params becomes a list of tensors of varying size (ie when the model is a deep neural network for instance). 虽然我可以很容易地重新形成张量以获得我想要的矩阵,但是当params成为不同大小的张量列表时(例如,当模型是深度神经网络时),这个操作似乎非常麻烦。

It seems that are two ways around this: 看来这有两种方式:

  • Flatten params to be a 1D tensor called flat_params : params平为一个名为flat_params一维张量:

     flat_params = tf.concat([tf.reshape(p, [-1]) for p in params]) 

    so that tf.hessians(loss, flat_params) naturally returns a 2x2 matrix. 所以tf.hessians(loss, flat_params)自然地返回一个2x2矩阵。 However as noted in Why does Tensorflow Reshape tf.reshape() break the flow of gradients? 然而,如为什么Tensorflow重塑tf.reshape()会破坏梯度流? for tf.gradients (but also holds for tf.hessians), tensorflow is not able to see the symbolic link in the graph between params and flat_params and tf.hessians(loss, flat_params) will raise an error as the gradients will be seen as None . 对于tf.gradients(但也适用于tf.hessians),tensorflow无法在paramsflat_params之间看到图形中的符号链接,并且tf.hessians(loss, flat_params)将引发错误,因为渐变将被视为None

  • In https://afqueiruga.github.io/tensorflow/2017/12/28/hessian-mnist.html , the author of the code goes the other way, and first create the flat parameter and reshapes its parts into self.params . https://afqueiruga.github.io/tensorflow/2017/12/28/hessian-mnist.html中 ,代码的作者走另一条路,首先创建flat参数并将其部分重新self.paramsself.params This trick does work and gets you the hessian with its expected shape (2x2 matrix). 这个技巧确实有效,可以获得具有预期形状(2x2矩阵)的粗麻布。 However, it seems to me that this will be cumbersome to use when you have a complex model, and impossible to apply if you create your model via built-in functions (like tf.layers.dense , ..). 但是,在我看来,当你有一个复杂的模型时,这将很麻烦,如果你通过内置函数(如tf.layers.dense ,..)创建模型,则无法应用。

Is there no straight-forward way to get the Hessian matrix (as in the 2x2 matrix in this example) from tf.hessians , when self.params is a list of tensor of arbitrary shapes? self.params是任意形状的张量列表时,是否没有直接的方法从tf.hessians获得Hessian矩阵(如本例中的2x2矩阵)? If not, how can you automatize the reshaping of the output tensor of tf.hessians ? 如果没有,你如何自动化tf.hessians输出张量的tf.hessians

It turns out (per TensorFlow r1.13) that if len(xs) > 1, then tf.hessians(ys, xs) returns tensors corresponding to only the block diagonal submatrices of the full Hessian matrix. 事实证明(根据TensorFlow r1.13),如果len(xs)> 1,则tf.hessians(ys,xs)返回仅对应于完整Hessian矩阵的块对角子矩阵的张量。 Full story and solutions in this paper https://arxiv.org/pdf/1905.05559 , and code at https://github.com/gknilsen/pyhessian 本文的全文和解决方案https://arxiv.org/pdf/1905.05559 ,代码为https://github.com/gknilsen/pyhessian

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM