简体   繁体   English

如何使用 autograd.grad 计算 PyTorch 中的参数损失的 Hessian

[英]How to compute Hessian of the loss w.r.t. the parameters in PyTorch using autograd.grad

I know there is quite a bit of content out there about "computing the Hessian" in pytorch, but as far as I've seen I haven't found anything working for me.我知道在 pytorch 中有很多关于“计算 Hessian”的内容,但据我所知,我没有找到任何对我有用的东西。 So to try to be most precise, the Hessian that I want is the Jacobian of the gradient of the loss with respect to the network parameters.因此,尽量准确地说,我想要的 Hessian 是相对于网络参数的损失梯度的雅可比行列式。 Also called the matrix of second-order derivatives with respect to the parameters.也称为关于参数的二阶导数矩阵。

I found some code that works in an intuitive way, although shouldn't be fast.我发现了一些以直观方式工作的代码,虽然不应该很快。 It clearly just computes the gradient of the gradient of the loss wrt the params wrt the params, and it does it one element (of the gradient) at a time.很明显,它只计算参数与参数之间的损失梯度的梯度,并且一次只计算一个元素(梯度的)。 I think the logic is definitely right but I am getting an error, having to do with requires_grad .我认为逻辑绝对是正确的,但我遇到了一个错误,与requires_grad I'm a pytorch beginner so maybe its a simple thing, but the error seems to be saying that it can't take the gradient of the env_grads variable, which is the output from the previous grad function call.我是 pytorch 初学者,所以这可能是一件简单的事情,但错误似乎是说它不能采用env_grads变量的梯度,这是前一个 grad 函数调用的输出。

Any help with this would be greatly appreciated.对此的任何帮助将不胜感激。 Here is the code followed by the error message.这是代码后跟错误消息。 I also printed out the env_grads[0] variable so we can see that it is in fact a tensor, which is the correct output from the previous grad call.我还打印了env_grads[0]变量,所以我们可以看到它实际上是一个张量,这是前一个grad调用的正确输出。

env_loss = loss_fn(env_outputs, env_targets)
total_loss += env_loss
env_grads = torch.autograd.grad(env_loss, params,retain_graph=True)

print( env_grads[0] )
hess_params = torch.zeros_like(env_grads[0])
for i in range(env_grads[0].size(0)):
    for j in range(env_grads[0].size(1)):
        hess_params[i, j] = torch.autograd.grad(env_grads[0][i][j], params, retain_graph=True)[0][i, j] #  <--- error here
print( hess_params )
exit()

Output:输出:

tensor([[-6.4064e-03, -3.1738e-03,  1.7128e-02,  8.0391e-03],
        [ 7.1698e-03, -2.4640e-03, -2.2769e-03, -1.0687e-03],
        [-3.0390e-04, -2.4273e-03, -4.0799e-02, -1.9149e-02],
        ...,
        [ 1.1258e-02, -2.5911e-05, -9.8133e-02, -4.6059e-02],
        [ 8.1502e-04, -2.5814e-03,  4.1772e-02,  1.9606e-02],
        [-1.0075e-02,  6.6072e-03,  8.3118e-04,  3.9011e-04]], device='cuda:0')

Error:错误:

Traceback (most recent call last):
  File "/home/jefferythewind/anaconda3/envs/rapids3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/jefferythewind/anaconda3/envs/rapids3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/run_synthetic.py", line 258, in <module>
    main(args)
  File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/run_synthetic.py", line 245, in main
    deep_mask=args.deep_mask
  File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/run_synthetic.py", line 103, in train
    scale_grad_inverse_sparsity=scale_grad_inverse_sparsity
  File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/and_mask_utils.py", line 154, in get_grads_deep
    hess_params[i, j] = torch.autograd.grad(env_grads[0][i][j], params, retain_graph=True)[0][i, j]
  File "/home/jefferythewind/anaconda3/envs/rapids3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

PyTorch recently-ish added a functional higher level API to torch.autograd which provides torch.autograd.functional.hessian(func, inputs, ... ) to directly evaluate the hessian of the scalar function func with respect to its arguments at a location specified by inputs , a tuple of tensors corresponding to the arguments of func . PyTorch 最近向torch.autograd添加了一个功能性更高级别的 API ,它提供了torch.autograd.functional.hessian(func, inputs, ... )来直接评估标量函数func相对于其某个位置的参数的 hessian由inputs指定,一个对应于func参数的张量元组。 hessian itself does not support automatic differentiation, I believe.我相信hessian本身不支持自动微分。

Note, however, that as of March 2021 it is still in beta.但请注意,截至 2021 年 3 月,它仍处于测试阶段。


Full example using torch.autograd.functional.hessian to create a score-test for non-zero mean (As a (bad) alternative to the one sample t-test):使用torch.autograd.functional.hessian为非零均值创建分数测试的完整示例(作为一个样本 t 检验的(坏)替代方案):

import numpy as np
import torch, torchvision
from torch.autograd import Variable, grad
import torch.distributions as td
import math
from torch.optim import Adam
import scipy.stats


x_data = torch.randn(100)+0.0 # observed data (here sampled under H0)

N = x_data.shape[0] # number of observations

mu_null = torch.zeros(1)
sigma_null_hat = Variable(torch.ones(1), requires_grad=True)

def log_lik(mu, sigma):
  return td.Normal(loc=mu, scale=sigma).log_prob(x_data).sum()

# Find theta_null_hat by some gradient descent algorithm (in this case an closed-form expression would be trivial to obtain (see below)):
opt = Adam([sigma_null_hat], lr=0.01)
for epoch in range(2000):
    opt.zero_grad() # reset gradient accumulator or optimizer
    loss = - log_lik(mu_null, sigma_null_hat) # compute log likelihood with current value of sigma_null_hat  (= Forward pass)
    loss.backward() # compute gradients (= Backward pass)
    opt.step()      # update sigma_null_hat
    
print(f'parameter fitted under null: sigma: {sigma_null_hat}, expected: {torch.sqrt((x_data**2).mean())}')

theta_null_hat = (mu_null, sigma_null_hat)

U = torch.tensor(torch.autograd.functional.jacobian(log_lik, theta_null_hat)) # Jacobian (= vector of partial derivatives of log likelihood w.r.t. the parameters (of the full/alternative model)) = score
I = -torch.tensor(torch.autograd.functional.hessian(log_lik, theta_null_hat)) / N # estimate of the Fisher information matrix
S = torch.t(U) @ torch.inverse(I) @ U / N # test statistic, often named "LM" (as in Lagrange multiplier), would be zero at the maximum likelihood estimate

pval_score_test = 1 - scipy.stats.chi2(df = 1).cdf(S) # S asymptocially follows a chi^2 distribution with degrees of freedom equal to the number of parameters fixed under H0
print(f'p-value Chi^2-based score test: {pval_score_test}')
#parameter fitted under null: sigma: tensor([0.9260], requires_grad=True), expected: 0.9259940385818481

# comparison with Student's t-test:
pval_t_test = scipy.stats.ttest_1samp(x_data, popmean = 0).pvalue
#> p-value Chi^2-based score test: 0.9203232752568568
print(f'p-value Student\'s t-test: {pval_t_test}')
#> p-value Student's t-test: 0.9209265268946605

Don't mind if I answer my own question here.不要介意我在这里回答我自己的问题。 I found the hint that I needed in this thread from the PyTorch site , about half way down.在 PyTorch 站点的这个线程中找到了我需要的提示,大约一半。

That won't work and would fit the 3rd point I mentioned.那行不通,符合我提到的第三点。 You would need to create the computation graph with differentiable operations, which will create a result tensor with a valid grad_fn.您需要使用可微操作创建计算图,这将创建一个具有有效 grad_fn 的结果张量。

I noticed there is an argument for autograd.grad called create_graph so I set this to True in the first call to grad , and that ended solving the error.我注意到有一个说法autograd.grad称为create_graph所以我设置为True在第一次调用grad ,以及截至解决错误。

Modified working code:修改后的工作代码:

env_loss = loss_fn(env_outputs, env_targets)
total_loss += env_loss
env_grads = torch.autograd.grad(env_loss, params, retain_graph=True, create_graph=True)

print( env_grads[0] )
hess_params = torch.zeros_like(env_grads[0])
for i in range(env_grads[0].size(0)):
    for j in range(env_grads[0].size(1)):
        hess_params[i, j] = torch.autograd.grad(env_grads[0][i][j], params, retain_graph=True)[0][i, j] #  <--- error here
print( hess_params )
exit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM