Pytorch loss.backward() 沒有給出 Rx、Ry Gate 參數的梯度

Question

我正在嘗試通過對輸入張量x執行線性變換來訓練參數params ，方法是將 Rx 矩陣乘以輸入，然后將 Ry 矩陣乘以其結果。 （每個矩陣 Rx 和 Ry 都有一個定義矩陣的參數params[i] ）。

然后我通過y的 mse 和預測的 output 計算損失。當我執行loss.backward()時，我得到的params.grad為 None。

import torch

def get_device(gpu_no):
    if torch.cuda.is_available():
        return torch.device('cuda', gpu_no)
    else:
        return torch.device('cpu')


device = get_device(0)

params = torch.tensor(([[0.011], [0.012]]), requires_grad=True).to(device).to(torch.cfloat)


x_gate = torch.tensor([[1., 0.], [0., 1.]]).to(device)
y_gate = torch.tensor(([[0, -1j], [1j, 0]])).to(device)


def rx(theta):
    # co = torch.cos(theta / 2)
    # si = torch.sin(theta / 2)
    # Rx_gate = torch.stack([torch.cat([co, -si], dim=-1),
    #                       torch.cat([-si, co], dim=-1)], dim=-2).squeeze(0).to(device).to(torch.cfloat).requires_grad_()
    # Rx_gate = torch.exp(-1j * (theta / 2) * x_gate).to(device).to(torch.cfloat).requires_grad_()
    Rx_gate = torch.tensor(([[torch.cos(theta/2), -torch.sin(theta/2)],
                             [-torch.sin(theta/2), torch.cos(theta/2)]]), requires_grad=True).to(device).to(torch.cfloat)

    return Rx_gate


def ry(theta):
    # co = torch.cos(theta / 2)
    # si = torch.sin(theta / 2)
    # Ry_gate = torch.stack([torch.cat([co, -si]),
    #                             torch.cat([si, co])], dim=-2).squeeze(0).to(device).to(torch.cfloat).requires_grad_()
    # Ry_gate = torch.exp(-1j * (theta / 2) * y_gate).to(device).to(torch.cfloat).requires_grad_()
    Ry_gate = torch.tensor(([[torch.cos(theta / 2), -torch.sin(theta / 2)],
                        [torch.sin(theta / 2), torch.cos(theta / 2)]]), requires_grad=True).to(device).to(torch.cfloat)

    return Ry_gate


x = torch.tensor([1., 0.]).to(device).to(torch.cfloat)
y = torch.tensor([0., 1.]).to(device).to(torch.cfloat)


def pred(params):
    out = rx(params[0]) @ x
    out = ry(params[1]) @ out
    return out

print("params        :", params)
print("prediction    :", pred(params))

loss = torch.pow((y - pred(params)), 2).sum()
print("loss          :", loss)

loss.backward()
print("loss grad     :", loss.grad)
print("params grad   :", params.grad)

我的 output 是

params        : tensor([[0.0110+0.j],
        [0.0120+0.j]], device='cuda:0', grad_fn=<ToCopyBackward0>)
prediction    : tensor([1.0000e+00+0.j, 5.0000e-04+0.j], device='cuda:0',
       grad_fn=<MvBackward0>)
loss          : tensor(1.9990+1.7485e-07j, device='cuda:0', grad_fn=<SumBackward0>)
loss grad     : None
params grad   : None

為什么 grad 沒有，即使 params 有grad_fn=<ToCopyBackward0> 。 我也收到此警告：

 UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at  aten\src\ATen/core/TensorBody.h:417.)

返回 self._grad

Answer 1

很好的觀察，您確實通過梯度對梯度進行了正確的反向傳播。 那么，為什么在訪問參數時沒有得到任何信息？

您無法訪問此參數的梯度的原因是只有葉張量將其梯度緩存在 memory 中。在這里，由於params是葉張量的副本（您調用to兩次以實現此目的），它不會被認為是計算圖的梯度。

為了在運行時訪問該參數的梯度，您可以強制引擎進行緩存，並按照警告消息的建議通過簡單調用retain_grad使其可以在外部訪問。

params.retain_grad()

Answer 2

我替換了我的代碼，所以我根本不使用.to() 。 對於 Rx、Ry 矩陣，我沒有使用torch.tensor() ，因為我在某處讀到它將變量從圖中分離出來。 我的新代碼現在與 grad 一起運行，我可以訓練線性變換矩陣 Rx 和 Ry。 感謝以上答案的澄清

這是我的新代碼。 像黃油一樣順滑：

import torch

def get_device(gpu_no):
    if torch.cuda.is_available():
        return torch.device('cuda', gpu_no)
    else:
        return torch.device('cpu')


device = get_device(0)

params = torch.tensor(([[0.011], [0.012]]), requires_grad=True, device=device, dtype=torch.cfloat)


x_gate = torch.tensor([[1., 0.], [0., 1.]], device=device, dtype=torch.cfloat)
y_gate = torch.tensor([[0, -1j], [1j, 0]], device=device, dtype=torch.cfloat)


def rx(theta):
    co = torch.cos(theta / 2)
    si = torch.sin(theta / 2)
    Rx_gate = torch.stack([torch.cat([co, -si], dim=-1),
                          torch.cat([-si, co], dim=-1)], dim=-2).squeeze(0)
    # Rx_gate = torch.exp(1j *  x_gate* (theta / 2))

    # print(" Rx_gate e", Rx_gate)
    # Rx_gate = torch.tensor(([[torch.cos(theta/2), -torch.sin(theta/2)],
    #                          [-torch.sin(theta/2), torch.cos(theta/2)]]), requires_grad=True, device=device, dtype=torch.cfloat)

    return Rx_gate


def ry(theta):
    co = torch.cos(theta / 2)
    si = torch.sin(theta / 2)
    Ry_gate = torch.stack([torch.cat([co, -si]),
                                torch.cat([si, co])], dim=-2).squeeze(0)
    # Ry_gate = torch.exp(1j * y_gate * (theta / 2))
    # Ry_gate = torch.tensor(([[torch.cos(theta / 2), -torch.sin(theta / 2)],
    #                     [torch.sin(theta / 2), torch.cos(theta / 2)]]), requires_grad=True, device=device, dtype=torch.cfloat)

    return Ry_gate


x = torch.tensor([1., 0.], device=device, dtype=torch.cfloat)
y = torch.tensor([0., 1.], device=device, dtype=torch.cfloat)


def pred(params):
    out = rx(params[0]) @ x
    out = ry(params[1]) @ out
    return out

print("params        :", params)
print("prediction    :", pred(params))

loss = torch.pow((y - pred(params)), 2).sum()
print("loss          :", loss)

loss.backward()

print("params grad   :", params.grad)

Output：

params        : tensor([[0.0110+0.j],
        [0.0120+0.j]], device='cuda:0', requires_grad=True)
prediction    : tensor([1.0000e+00+0.j, 5.0000e-04+0.j], device='cuda:0',
       grad_fn=<MvBackward0>)
loss          : tensor(1.9990+1.7485e-07j, device='cuda:0', grad_fn=<SumBackward0>)
params grad   : tensor([[ 1.0000+0.j],
        [-1.0000+0.j]], device='cuda:0')

Pytorch loss.backward() 沒有給出 Rx、Ry Gate 參數的梯度

問題描述

2 個解決方案

解決方案1
1 2022-03-01 15:40:44

解決方案2
1 2022-03-01 17:36:29

Pytorch loss.backward() 沒有給出 Rx、Ry Gate 參數的梯度

問題描述

2 個解決方案

解決方案1 1 2022-03-01 15:40:44

解決方案2 1 2022-03-01 17:36:29

解決方案1
1 2022-03-01 15:40:44

解決方案2
1 2022-03-01 17:36:29