[英]Constant loss during LSTM training - PyTorch
I'm trying to implement an LSTM network for predicting the next word in a sentence.我正在尝试实现一个 LSTM 网络来预测句子中的下一个单词。 This is my first time building a neural network and I'm confused by all the information I found on the Internet.这是我第一次构建神经网络,我对在互联网上找到的所有信息感到困惑。
I'm trying to use the following architecture:我正在尝试使用以下架构:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class WordLSTM(nn.Module):
def __init__(self, vocabulary_size, embedding_dim, hidden_dim):
super().__init__()
# Word embeddings
self.encoder = nn.Embedding(vocabulary_size, embedding_dim)
# LSTM input dim is embedding_dim, output dim is hidden_dim
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
# Linear layer to map hidden states to vocabulary space
self.decoder = nn.Linear(hidden_dim, vocabulary_size)
def forward(self, sentence):
encoded = self.encoder(sentence)
output, _ = self.lstm(
encoded.view(len(sentence), 1, -1))
decoded = self.decoder(output)
word_scores = F.softmax(decoded, dim=1)
return word_scores[-1].view(1, -1)
I've created a dictionary with all the sentences from my dataset, and each word is encoded with their respective indices from the dictionary.我已经用我的数据集中的所有句子创建了一个字典,每个单词都用字典中各自的索引进行编码。 They're followed by an encoded next word (target vector).它们后面跟着一个编码的下一个词(目标向量)。 Here's a bunch of training examples that I'm trying to use:这是我尝试使用的一堆训练示例:
[tensor([39]), tensor([13698])],
[tensor([ 39, 13698]), tensor([11907])],
[tensor([ 39, 13698, 11907]), tensor([70])]
I'm passing one sentence at a time during training, so my batch size is always 1.我在训练期间一次传递一个句子,所以我的批量大小始终为 1。
NUM_EPOCHS = 100
LEARNING_RATE = 0.0005
rnn = WordLSTM(vocab_size, 64, 32)
optimizer = optim.SGD(rnn.parameters(), lr=LEARNING_RATE)
for epoch in range(NUM_EPOCHS):
training_example = generate_random_training_example(training_ds)
optimizer.zero_grad()
for sentence, next_word in training_example:
output = rnn(sentence)
loss = F.cross_entropy(output, next_word)
loss.backward()
optimizer.step()
print(f"Epoch: {epoch}/{NUM_EPOCHS} Loss: {loss:.4f}")
However, when I start the training, the loss does not change with time:但是,当我开始训练时,损失不会随时间变化:
Epoch: 0/100 Loss: 10.3929
Epoch: 1/100 Loss: 10.3929
Epoch: 2/100 Loss: 10.3929
Epoch: 3/100 Loss: 10.3929
Epoch: 4/100 Loss: 10.3929
Epoch: 5/100 Loss: 10.3929
Epoch: 6/100 Loss: 10.3929
I've tried placing optimizer.zero_grad()
and optimizer.step()
in a different places already, but didn't help either.我已经尝试将optimizer.zero_grad()
和optimizer.step()
放在不同的地方,但也没有帮助。
What could be the problem in this case?在这种情况下可能有什么问题? Am I calculating the loss in the wrong way, or do I pass the tensors in the wrong format?我是在以错误的方式计算损失,还是以错误的格式传递张量?
Delete F.softmax
.删除F.softmax
。 You do log_softmax(softmax(x)).你做 log_softmax(softmax(x))。
This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.该标准将 nn.LogSoftmax() 和 nn.NLLLoss() 组合在一个类中。
import torch as t
class Net(t.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.emb = t.nn.Embedding(100, 8)
self.lstm = t.nn.LSTM(8, 16, batch_first=True)
self.linear = t.nn.Linear(16, 100)
def forward(self, x):
x = self.emb(x)
x, _ = self.lstm(x)
x = self.linear(x[:, -1])
#x = t.nn.Softmax(dim=1)(x)
return x
t.manual_seed(0)
net = Net()
batch_size = 1
X = t.LongTensor(batch_size, 5).random_(0, 100)
Y = t.LongTensor(batch_size).random_(0, 100)
optimizer = t.optim.Adam(net.parameters())
criterion = t.nn.CrossEntropyLoss()
for epoch in range(10):
optimizer.zero_grad()
output = net(X)
loss = criterion(output, Y)
loss.backward()
optimizer.step()
print(loss.item())
4.401515960693359 4.389760494232178 4.377873420715332 4.365848541259766 4.353675365447998 4.341339588165283 4.328824520111084 4.316114902496338 4.303196430206299 4.2900567054748535
With uncommented t.nn.Softmax
:使用未注释的t.nn.Softmax
:
4.602912902832031 4.6027679443359375 4.602619171142578 4.6024675369262695 4.602311611175537 4.602152347564697 4.601987361907959 4.601818084716797 4.6016435623168945 4.601463794708252
Use softmax during evaluation:在评估期间使用 softmax:
net.eval()
t.nn.Softmax(dim=1)(net(X[0].view(1,-1)))
tensor([[0.0088, 0.0121, 0.0098, 0.0072, 0.0085, 0.0083, 0.0083, 0.0108, 0.0127, 0.0090, 0.0094, 0.0082, 0.0099, 0.0115, 0.0094, 0.0107, 0.0081, 0.0096, 0.0087, 0.0131, 0.0129, 0.0127, 0.0118, 0.0107, 0.0087, 0.0073, 0.0114, 0.0076, 0.0103, 0.0112, 0.0104, 0.0077, 0.0116, 0.0091, 0.0091, 0.0104, 0.0106, 0.0094, 0.0116, 0.0091, 0.0117, 0.0118, 0.0106, 0.0113, 0.0083, 0.0091, 0.0076, 0.0089, 0.0076, 0.0120, 0.0107, 0.0139, 0.0097, 0.0124, 0.0096, 0.0097, 0.0104, 0.0128, 0.0084, 0.0119, 0.0096, 0.0100, 0.0073, 0.0099, 0.0086, 0.0090, 0.0089, 0.0098, 0.0102, 0.0086, 0.0115, 0.0110, 0.0078, 0.0097, 0.0115, 0.0102, 0.0103, 0.0107, 0.0095, 0.0083, 0.0090, 0.0120, 0.0085, 0.0113, 0.0128, 0.0074, 0.0096, 0.0123, 0.0106, 0.0105, 0.0101, 0.0112, 0.0086, 0.0105, 0.0121, 0.0103, 0.0075, 0.0098, 0.0082, 0.0093]], grad_fn=)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.