[英]Pytorch loss.backward() gives none grad for parameters of Rx, Ry Gate
我正在嘗試通過對輸入張量x
執行線性變換來訓練參數params
,方法是將 Rx 矩陣乘以輸入,然后將 Ry 矩陣乘以其結果。 (每個矩陣 Rx 和 Ry 都有一個定義矩陣的參數params[i]
)。
然后我通過y
的 mse 和預測的 output 計算損失。當我執行loss.backward()
時,我得到的params.grad
為 None。
import torch
def get_device(gpu_no):
if torch.cuda.is_available():
return torch.device('cuda', gpu_no)
else:
return torch.device('cpu')
device = get_device(0)
params = torch.tensor(([[0.011], [0.012]]), requires_grad=True).to(device).to(torch.cfloat)
x_gate = torch.tensor([[1., 0.], [0., 1.]]).to(device)
y_gate = torch.tensor(([[0, -1j], [1j, 0]])).to(device)
def rx(theta):
# co = torch.cos(theta / 2)
# si = torch.sin(theta / 2)
# Rx_gate = torch.stack([torch.cat([co, -si], dim=-1),
# torch.cat([-si, co], dim=-1)], dim=-2).squeeze(0).to(device).to(torch.cfloat).requires_grad_()
# Rx_gate = torch.exp(-1j * (theta / 2) * x_gate).to(device).to(torch.cfloat).requires_grad_()
Rx_gate = torch.tensor(([[torch.cos(theta/2), -torch.sin(theta/2)],
[-torch.sin(theta/2), torch.cos(theta/2)]]), requires_grad=True).to(device).to(torch.cfloat)
return Rx_gate
def ry(theta):
# co = torch.cos(theta / 2)
# si = torch.sin(theta / 2)
# Ry_gate = torch.stack([torch.cat([co, -si]),
# torch.cat([si, co])], dim=-2).squeeze(0).to(device).to(torch.cfloat).requires_grad_()
# Ry_gate = torch.exp(-1j * (theta / 2) * y_gate).to(device).to(torch.cfloat).requires_grad_()
Ry_gate = torch.tensor(([[torch.cos(theta / 2), -torch.sin(theta / 2)],
[torch.sin(theta / 2), torch.cos(theta / 2)]]), requires_grad=True).to(device).to(torch.cfloat)
return Ry_gate
x = torch.tensor([1., 0.]).to(device).to(torch.cfloat)
y = torch.tensor([0., 1.]).to(device).to(torch.cfloat)
def pred(params):
out = rx(params[0]) @ x
out = ry(params[1]) @ out
return out
print("params :", params)
print("prediction :", pred(params))
loss = torch.pow((y - pred(params)), 2).sum()
print("loss :", loss)
loss.backward()
print("loss grad :", loss.grad)
print("params grad :", params.grad)
我的 output 是
params : tensor([[0.0110+0.j],
[0.0120+0.j]], device='cuda:0', grad_fn=<ToCopyBackward0>)
prediction : tensor([1.0000e+00+0.j, 5.0000e-04+0.j], device='cuda:0',
grad_fn=<MvBackward0>)
loss : tensor(1.9990+1.7485e-07j, device='cuda:0', grad_fn=<SumBackward0>)
loss grad : None
params grad : None
為什么 grad 沒有,即使 params 有grad_fn=<ToCopyBackward0>
。 我也收到此警告:
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten\src\ATen/core/TensorBody.h:417.)
返回 self._grad
很好的觀察,您確實通過梯度對梯度進行了正確的反向傳播。 那么,為什么在訪問參數時沒有得到任何信息?
您無法訪問此參數的梯度的原因是只有葉張量將其梯度緩存在 memory 中。在這里,由於params
是葉張量的副本(您調用to
兩次以實現此目的),它不會被認為是計算圖的梯度。
為了在運行時訪問該參數的梯度,您可以強制引擎進行緩存,並按照警告消息的建議通過簡單調用retain_grad
使其可以在外部訪問。
params.retain_grad()
我替換了我的代碼,所以我根本不使用.to()
。 對於 Rx、Ry 矩陣,我沒有使用torch.tensor()
,因為我在某處讀到它將變量從圖中分離出來。 我的新代碼現在與 grad 一起運行,我可以訓練線性變換矩陣 Rx 和 Ry。 感謝以上答案的澄清
這是我的新代碼。 像黃油一樣順滑:
import torch
def get_device(gpu_no):
if torch.cuda.is_available():
return torch.device('cuda', gpu_no)
else:
return torch.device('cpu')
device = get_device(0)
params = torch.tensor(([[0.011], [0.012]]), requires_grad=True, device=device, dtype=torch.cfloat)
x_gate = torch.tensor([[1., 0.], [0., 1.]], device=device, dtype=torch.cfloat)
y_gate = torch.tensor([[0, -1j], [1j, 0]], device=device, dtype=torch.cfloat)
def rx(theta):
co = torch.cos(theta / 2)
si = torch.sin(theta / 2)
Rx_gate = torch.stack([torch.cat([co, -si], dim=-1),
torch.cat([-si, co], dim=-1)], dim=-2).squeeze(0)
# Rx_gate = torch.exp(1j * x_gate* (theta / 2))
# print(" Rx_gate e", Rx_gate)
# Rx_gate = torch.tensor(([[torch.cos(theta/2), -torch.sin(theta/2)],
# [-torch.sin(theta/2), torch.cos(theta/2)]]), requires_grad=True, device=device, dtype=torch.cfloat)
return Rx_gate
def ry(theta):
co = torch.cos(theta / 2)
si = torch.sin(theta / 2)
Ry_gate = torch.stack([torch.cat([co, -si]),
torch.cat([si, co])], dim=-2).squeeze(0)
# Ry_gate = torch.exp(1j * y_gate * (theta / 2))
# Ry_gate = torch.tensor(([[torch.cos(theta / 2), -torch.sin(theta / 2)],
# [torch.sin(theta / 2), torch.cos(theta / 2)]]), requires_grad=True, device=device, dtype=torch.cfloat)
return Ry_gate
x = torch.tensor([1., 0.], device=device, dtype=torch.cfloat)
y = torch.tensor([0., 1.], device=device, dtype=torch.cfloat)
def pred(params):
out = rx(params[0]) @ x
out = ry(params[1]) @ out
return out
print("params :", params)
print("prediction :", pred(params))
loss = torch.pow((y - pred(params)), 2).sum()
print("loss :", loss)
loss.backward()
print("params grad :", params.grad)
Output:
params : tensor([[0.0110+0.j],
[0.0120+0.j]], device='cuda:0', requires_grad=True)
prediction : tensor([1.0000e+00+0.j, 5.0000e-04+0.j], device='cuda:0',
grad_fn=<MvBackward0>)
loss : tensor(1.9990+1.7485e-07j, device='cuda:0', grad_fn=<SumBackward0>)
params grad : tensor([[ 1.0000+0.j],
[-1.0000+0.j]], device='cuda:0')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.