[英]How to compute all second derivatives (only the diagonal of the Hessian matrix) in Tensorflow?
I have a loss value/function and I would like to compute all the second derivatives with respect to a tensor f (of size n).我有一个损失值/函数,我想计算关于张量f (大小为 n)的所有二阶导数。 I managed to use tf.gradients twice, but when applying it for the second time, it sums the derivatives across the first input (see second_derivatives in my code).
我设法使用 tf.gradients 两次,但是当第二次应用它时,它会对第一个输入的导数求和(请参阅我的代码中的second_derivatives )。
Also I managed to retrieve the Hessian matrix, but I would like to only compute its diagonal to avoid extra-computation.我也设法检索了 Hessian 矩阵,但我只想计算它的对角线以避免额外计算。
import tensorflow as tf
import numpy as np
f = tf.Variable(np.array([[1., 2., 0]]).T)
loss = tf.reduce_prod(f ** 2 - 3 * f + 1)
first_derivatives = tf.gradients(loss, f)[0]
second_derivatives = tf.gradients(first_derivatives, f)[0]
hessian = [tf.gradients(first_derivatives[i,0], f)[0][:,0] for i in range(3)]
model = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(model)
print "\nloss\n", sess.run(loss)
print "\nloss'\n", sess.run(first_derivatives)
print "\nloss''\n", sess.run(second_derivatives)
hessian_value = np.array(map(list, sess.run(hessian)))
print "\nHessian\n", hessian_value
My thinking was that tf.gradients(first_derivatives, f[0, 0])[0] would work to retrieve for instance the second derivative with respect to f_0 but it seems that tensorflow doesn't allow to derive from a slice of a tensor.我的想法是tf.gradients(first_derivatives, f[0, 0])[0]可以检索例如关于 f_0 的二阶导数,但似乎 tensorflow 不允许从张量的切片中导出.
tf.gradients([f1,f2,f3],...)
computes gradient of f=f1+f2+f3
Also, differentiating with respect to x[0]
is problematic because x[0]
refers to a new Slice
node which is not an ancestor of your loss, so derivative with respect to it will be None
. tf.gradients([f1,f2,f3],...)
计算f=f1+f2+f3
梯度 此外,关于x[0]
微分是有问题的,因为x[0]
指的是一个新的Slice
节点不是您损失的祖先,因此关于它的导数将为None
。 You could get around it by using pack
to glue x[0], x[1], ...
together into xx
and have your loss depend on xx
instead of x
.您可以通过使用
pack
将x[0], x[1], ...
粘合到xx
来解决它,并使您的损失取决于xx
而不是x
。 An alternative is to use separate variables for individual components, in which case computing Hessian would look something like this.另一种方法是为各个组件使用单独的变量,在这种情况下,计算 Hessian 看起来像这样。
def replace_none_with_zero(l):
return [0 if i==None else i for i in l]
tf.reset_default_graph()
x = tf.Variable(1.)
y = tf.Variable(1.)
loss = tf.square(x) + tf.square(y)
grads = tf.gradients([loss], [x, y])
hess0 = replace_none_with_zero(tf.gradients([grads[0]], [x, y]))
hess1 = replace_none_with_zero(tf.gradients([grads[1]], [x, y]))
hessian = tf.pack([tf.pack(hess0), tf.pack(hess1)])
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
print hessian.eval()
You'll see你会看到
[[ 2. 0.]
[ 0. 2.]]
The following function calculates 2nd derivatives (the diagonal of the Hessian matrix) in Tensorflow 2.0:以下函数计算 Tensorflow 2.0 中的二阶导数(Hessian 矩阵的对角线):
%tensorflow_version 2.x # Tells Colab to load TF 2.x
import tensorflow as tf
def calc_hessian_diag(f, x):
"""
Calculates the diagonal entries of the Hessian of the function f
(which maps rank-1 tensors to scalars) at coordinates x (rank-1
tensors).
Let k be the number of points in x, and n be the dimensionality of
each point. For each point k, the function returns
(d^2f/dx_1^2, d^2f/dx_2^2, ..., d^2f/dx_n^2) .
Inputs:
f (function): Takes a shape-(k,n) tensor and outputs a
shape-(k,) tensor.
x (tf.Tensor): The points at which to evaluate the Laplacian
of f. Shape = (k,n).
Outputs:
A tensor containing the diagonal entries of the Hessian of f at
points x. Shape = (k,n).
"""
# Use the unstacking and re-stacking trick, which comes
# from https://github.com/xuzhiqin1990/laplacian/
with tf.GradientTape(persistent=True) as g1:
# Turn x into a list of n tensors of shape (k,)
x_unstacked = tf.unstack(x, axis=1)
g1.watch(x_unstacked)
with tf.GradientTape() as g2:
# Re-stack x before passing it into f
x_stacked = tf.stack(x_unstacked, axis=1) # shape = (k,n)
g2.watch(x_stacked)
f_x = f(x_stacked) # shape = (k,)
# Calculate gradient of f with respect to x
df_dx = g2.gradient(f_x, x_stacked) # shape = (k,n)
# Turn df/dx into a list of n tensors of shape (k,)
df_dx_unstacked = tf.unstack(df_dx, axis=1)
# Calculate 2nd derivatives
d2f_dx2 = []
for df_dxi,xi in zip(df_dx_unstacked, x_unstacked):
# Take 2nd derivative of each dimension separately:
# d/dx_i (df/dx_i)
d2f_dx2.append(g1.gradient(df_dxi, xi))
# Stack 2nd derivates
d2f_dx2_stacked = tf.stack(d2f_dx2, axis=1) # shape = (k,n)
return d2f_dx2_stacked
Here's an example usage, with the function f(x) = ln(r)
, where x
are 3D coordinates and r
is the radius is spherical coordinates:这是一个示例用法,函数
f(x) = ln(r)
,其中x
是 3D 坐标, r
是半径是球坐标:
f = lambda q : tf.math.log(tf.math.reduce_sum(q**2, axis=1))
x = tf.random.uniform((5,3))
d2f_dx2 = calc_hessian_diag(f, x)
print(d2f_dx2)
The will look something like this:将看起来像这样:
tf.Tensor(
[[ 1.415968 1.0215727 -0.25363517]
[-0.67299247 2.4847088 0.70901346]
[ 1.9416015 -1.1799507 1.3937857 ]
[ 1.4748447 0.59702784 -0.52290654]
[ 1.1786096 0.07442689 0.2396735 ]], shape=(5, 3), dtype=float32)
We can check the correctness of the implementation by calculating the Laplacian (ie, by summing the diagonal of the Hessian matrix), and comparing to the theoretical answer for our chosen function, 2 / r^2
:我们可以通过计算拉普拉斯算子(即通过对 Hessian 矩阵的对角线求和)并与我们选择的函数
2 / r^2
的理论答案进行比较来检查实现的正确性:
print(tf.reduce_sum(d2f_dx2, axis=1)) # Laplacian from summing above results
print(2./tf.math.reduce_sum(x**2, axis=1)) # Analytic expression for Lapalcian
I get the following:我得到以下信息:
tf.Tensor([2.1839054 2.5207298 2.1554365 1.5489659 1.49271 ], shape=(5,), dtype=float32)
tf.Tensor([2.1839058 2.5207298 2.1554365 1.5489662 1.4927098], shape=(5,), dtype=float32)
They agree to within rounding error.他们同意在舍入误差内。
Now consider tf.hessians,现在考虑 tf.hessians,
tf.hessians(loss, f)
https://www.tensorflow.org/api_docs/python/tf/hessians https://www.tensorflow.org/api_docs/python/tf/hessians
If it is not necessary to use Tensorflow to calculate the second derivative you can do: 如果没有必要使用Tensorflow计算二阶导数,您可以:
import sympy as sp
x =sp.symbols('x', real=True)
loss = x**2 - 3*x + 1
f = loss
a = sp.diff(f,x)
b = sp.diff(a)
print ("First derivative of: loss = f ** 2 - 3 * f + 1 respect to f is: ", a,"and the
second derivative is: ",b)
that results in: 这导致:
First derivative of: loss = f ** 2 - 3 * f + 1 respect to f is: 2*x - 3 and the second derivative is: 2
If you want to calculate the Hessian you can follow this example: 如果你想计算Hessian,你可以按照这个例子:
import sympy as sp
x,y,z = sp.symbols('x y z', real=True)
f = (x**2)*y*z
Jacobian = np.array([sp.diff(f,x),sp.diff(f,y),sp.diff(f,z)])
Hessian =
np.matrix([[sp.diff(sp.diff(f,x),x),sp.diff(sp.diff(f,y),x),sp.diff(sp.diff(f,z),x)],
[sp.diff(sp.diff(f,x),y),sp.diff(sp.diff(f,y),y),sp.diff(sp.diff(f,z),y)],
[sp.diff(sp.diff(f,x),z),sp.diff(sp.diff(f,z),y),sp.diff(sp.diff(f,z),z)]])
print ("La Jacobian es:", Jacobian)
print ("La Hessian es:\n",Hessian)
that results in: 这导致:
La Jacobian es: [[2*x*y*z x**2*z x**2*y]]
La Hessian es:
[[2*y*z 2*x*z 2*x*y]
[2*x*z 0 x**2]
[2*x*y x**2 0]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.