TensorFlow：计算 Hessian 矩阵（和高阶导数）

Question

I would like to be able to compute higher order derivatives for my loss function.我希望能够为我的损失函数计算高阶导数。 At the very least I would like to be able to compute the Hessian matrix.至少我希望能够计算 Hessian 矩阵。 At the moment I am computing a numerical approximation to the Hessian but this is more expensive, and more importantly, as far as I understand, inaccurate if the matrix is ill-conditioned (with very large condition number).目前我正在计算 Hessian 的数值近似值，但这更昂贵，更重要的是，据我所知，如果矩阵病态（条件数非常大），则不准确。

Theano implements this through symbolic looping, see here , but Tensorflow does not seem to support symbolic control flow yet, see here . Theano 通过符号循环实现这一点，请参阅此处，但 Tensorflow 似乎尚不支持符号控制流，请参阅此处。 A similar issue has been raised on TF github page, see here , but it looks like nobody has followed up on the issue for a while. TF github 页面上也提出了类似的问题，请参阅此处，但似乎已经有一段时间没有人关注该问题了。

Is anyone aware of more recent developments or ways to compute higher order derivatives (symbolically) in TensorFlow?有没有人知道最近的发展或在 TensorFlow 中计算高阶导数（符号）的方法？

Answer 1

Well, you can , with little effort, compute the hessian matrix!好吧，您可以毫不费力地计算 Hessian 矩阵！

Suppose you have two variables :假设您有两个变量：

x = tf.Variable(np.random.random_sample(), dtype=tf.float32)
y = tf.Variable(np.random.random_sample(), dtype=tf.float32)

and a function defined using these 2 variables:以及使用这两个变量定义的函数：

f = tf.pow(x, cons(2)) + cons(2) * x * y + cons(3) * tf.pow(y, cons(2)) + cons(4) * x + cons(5) * y + cons(6)

where:在哪里：

def cons(x):
    return tf.constant(x, dtype=tf.float32)

So in algebraic terms, this function is所以在代数方面，这个函数是

Now we define a method that compute the hessian:现在我们定义一个计算hessian的方法：

def compute_hessian(fn, vars):
    mat = []
    for v1 in vars:
        temp = []
        for v2 in vars:
            # computing derivative twice, first w.r.t v2 and then w.r.t v1
            temp.append(tf.gradients(tf.gradients(f, v2)[0], v1)[0])
        temp = [cons(0) if t == None else t for t in temp] # tensorflow returns None when there is no gradient, so we replace None with 0
        temp = tf.pack(temp)
        mat.append(temp)
    mat = tf.pack(mat)
    return mat

and call it with:并调用它：

# arg1: our defined function, arg2: list of tf variables associated with the function
hessian = compute_hessian(f, [x, y])

Now we grab a tensorflow session, initialize the variables, and run hessian :现在我们获取一个 tensorflow 会话，初始化变量，然后运行hessian ：

sess = tf.Session()
sess.run(tf.initialize_all_variables())
print sess.run(hessian)

Note: Since the function we used is quadratic in nature (and we are differentiating twice), the hessian returned will have constant values irrespective of the variables.注意：由于我们使用的函数本质上是二次的（并且我们进行了两次微分），因此无论变量如何，返回的 hessian 都将具有常量值。

The output is :输出是：

[[ 2.  2.]
[ 2.  6.]]

Answer 2

A word of caution: Hessian matrices (or more generally, tensors) are expensive to compute and store.提醒一句：Hessian 矩阵（或更一般地说，张量）的计算和存储成本很高。 You may actually re-think if you really need the full Hessian, or just some hessian properties .您实际上可能会重新考虑是否真的需要完整的 Hessian 或一些 hessian 属性。 A number of them, including traces, norms, and top eigen-values can be obtained without explicit hessian matrix , just using the Hessian-vector product oracle.其中许多，包括迹、范数和顶部特征值，无需显式Hessian矩阵即可获得，只需使用 Hessian 向量乘积预言机即可。 In turn, hessian-vector products can be implemented efficiently (also in leading autodiff frameworks such as Tensorflow and PyTorch)反过来，hessian-vector 产品可以有效地实现（也在领先的 autodiff 框架中，例如 Tensorflow 和 PyTorch）

TensorFlow：计算 Hessian 矩阵（和高阶导数）

问题描述

2 个解决方案

解决方案1
9 已采纳 2016-06-06 20:10:10

解决方案2
0 2020-12-21 16:30:38

TensorFlow：计算 Hessian 矩阵（和高阶导数）

问题描述

2 个解决方案

解决方案1 9 已采纳 2016-06-06 20:10:10

解决方案2 0 2020-12-21 16:30:38

解决方案1
9 已采纳 2016-06-06 20:10:10

解决方案2
0 2020-12-21 16:30:38