[英]How to calculate Gradient of the loss with respect to input?
I have a pre-trained PyTorch model.我有一个预先训练好的 PyTorch 模型。 I need to calculate the gradient of the loss with respect to the network's inputs using this model (without training again and only using the pre-trained model).
我需要使用此模型计算相对于网络输入的损失梯度(无需再次训练,仅使用预训练模型)。
I wrote the following code, but I am not sure it is correct or not.我写了以下代码,但我不确定它是否正确。
test_X, test_y = load_data(mode='test')
testset_original = MyDataset(test_X, test_y, transform=default_transform)
testloader = DataLoader(testset_original, batch_size=32, shuffle=True)
model = MyModel(device=device).to(device)
checkpoint = torch.load('checkpoint.pt')
model.load_state_dict(checkpoint['model_state_dict'])
gradient_losses = []
for i, data in enumerate(testloader):
inputs, labels = data
inputs= inputs.to(device)
labels = labels.to(device)
inputs.requires_grad = True
output = model(inputs)
loss = loss_function(output)
loss.backward()
gradient_losses.append(inputs.grad)
My question is, does this list gradient_losses actually storing what I wish to store?我的问题是,这个列表 gradient_losses 是否真的存储了我想要存储的内容? If not, what is the correct way to do that?
如果不是,那么正确的方法是什么?
does this list gradient_losses actually storing what I wish to store?
这个列表gradient_losses实际上存储了我想要存储的内容吗?
Yes, if you are looking to get the derivative of the loss with respect to the input then that seems to be the correct way to do it.是的,如果您希望获得损失相对于输入的导数,那么这似乎是正确的方法。 Here is minimal example, take
f(x) = a*x
.这是最小的例子,取
f(x) = a*x
。 Then df/dx = a
.然后
df/dx = a
。
>>> x = torch.rand(10, requires_grad=True)
>>> y = torch.rand(10)
>>> a = torch.tensor([3.], requires_grad=True)
>>> loss = a*x - y
>>> loss.mean().backward()
>>> x.grad
tensor([0.3000, 0.3000, ..., 0.3000, 0.3000])
Which, in this case is equal to a / len(x)
其中,在这种情况下等于
a / len(x)
Do note, each gradient you extract with input.grad
will be averaged over the whole batch, and won't be a gradient over each individual input.请注意,您使用
input.grad
提取的每个梯度都将在整个批次中input.grad
平均值,而不是每个单独输入的梯度。
Also, you don't need to .clone()
your input gradients as they are not part of the model and won't get zeroed by model.zero_grad()
.此外,您不需要
.clone()
输入梯度,因为它们不是模型的一部分,并且不会被model.zero_grad()
归零。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.