
[英]How to access gradients and modify weights (parameters) directly during training with CNTK?
[英]How to access the gradients of intermediate outputs during the training loop?
假设我有以下(相对)小的 lstm model:
首先,让我们创建一些伪输入/目标数据:
import torch
# create pseudo input data (features)
features = torch.rand(size = (64, 24, 3)) # of shape (batch_size, num_time_steps, num_features)
# create pseudo target data
targets = torch.ones(size = (64, 24, 1)) # of shape (batch_size, num_time_steps, num_targets)
# store num. of time steps
num_time_steps = features.shape[1]
现在,让我们定义一个简单的 lstm model:
# create a simple lstm model with lstm_cell
class SmallModel(torch.nn.Module):
def __init__(self):
super().__init__() # initialize the parent class
# define the layers
self.lstm_cell = torch.nn.LSTMCell(input_size = features.shape[2], hidden_size = 16)
self.fc = torch.nn.Linear(in_features = 16, out_features = targets.shape[2])
def forward(self, features):
# initialise states
hx = torch.randn(64, 16)
cx = torch.randn(64, 16)
# empty list to collect final preds
a_s = []
b_s = []
c_s = []
for t in range(num_time_steps): # loop through each time step
# select features at the current time step t
features_t = features[:, t, :]
# forward computation at the current time step t
hx, cx = self.lstm_cell(features_t, (hx, cx))
out_t = torch.relu(self.fc(hx))
# do some computation with the output
a = out_t * 0.8 + 20
b = a * 2
c = b * 0.9
a_s.append(a)
b_s.append(b)
c_s.append(c)
a_s = torch.stack(a_s, dim = 1) # of shape (batch_size, num_time_steps, num_targets)
b_s = torch.stack(b_s, dim = 1)
c_s = torch.stack(c_s, dim = 1)
return a_s, b_s, c_s
实例化 model,失去乐趣。 和优化器:
# instantiate the model
model = SmallModel()
# loss function
loss_fn = torch.nn.MSELoss()
# optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
现在,在训练循环期间,我想打印每个时期的中间( a_s.grad
, b_s.grad
)输出的梯度:
# number of epochs
n_epoch = 10
# training loop
for epoch in range(n_epoch): # loop through each epoch
# zero out the grad because pytorch accumulates them
optimizer.zero_grad()
# make predictions
a_s, b_s, c_s = model(features)
# retain the gradients of intermediate outputs
a_s.retain_grad()
b_s.retain_grad()
c_s.retain_grad()
# compute loss
loss = loss_fn(c_s, targets)
# backward computation
loss.backward()
# print gradients of outpus at each epoch
print(a_s.grad)
print(b_s.grad)
# update the weights
optimizer.step()
但我得到以下信息:
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
如何获得中间输出的实际梯度?
c_s不是a_s和 b_s 的function这就是问题所在。
在你的代码中:
loss = func(c_s, *)
c_s = func(a, b)
# c_s = func(a_s, b_s) is not true
因此,在向后传递期间,不会为变量a_s和b_s计算梯度。
试试这个修改前向function 以获得a_s和b_s的梯度,其中c_s = func(a_s, b_s) :
def forward(self, features):
# initialise states
hx = torch.randn(64, 16)
cx = torch.randn(64, 16)
# empty list to collect final preds
a_s = []
b_s = []
c_s = []
for t in range(num_time_steps): # loop through each time step
# select features at the current time step t
features_t = features[:, t, :]
# forward computation at the current time step t
hx, cx = self.lstm_cell(features_t, (hx, cx))
out_t = torch.relu(self.fc(hx))
# do some computation with the output
a = out_t * 0.8 + 20
# b = a * 2
# c = b * 0.9
a_s.append(a)
# b_s.append(b)
# c_s.append(c)
a_s = torch.stack(a_s, dim = 1) # of shape (batch_size, num_time_steps, num_targets)
##########################################
## c_s = func(a_s, b_s)
##########################################
b_s = a_s * 2
c_s = b_s * 0.9
##########################################
##########################################
return a_s, b_s, c_s
问题未解决?试试以下方法:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.