简体   繁体   English

PyTorch N对1 LSTM不学习任何内容

[英]PyTorch n-to-1 LSTM does not learn anything

I am new to PyTorch and LSTMs and I am trying to train a classification model that takes a sentences where each word is encoded via word2vec (pre-trained vectors) and outputs one class after it saw the full sentence. 我是PyTorch和LSTM的新手,我正在尝试训练一个分类模型,该模型采用一个句子,其中每个单词都通过word2vec(预训练向量)进行编码,并在看到完整的句子后输出一个类。 I have four different classes. 我有四个不同的班级。 The sentences have variable length. 句子的长度可变。

My code is running without errors, but it always predicts the same class, no matter how many epochs I train my model. 我的代码正在运行,没有错误,但是无论我训练模型有多少个时期,它总是可以预测相同的类。 So I think the gradients are not properly backpropagated. 因此,我认为渐变没有正确地反向传播。 Here is my code: 这是我的代码:

class LSTM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, tagset_size):
        super(LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        # The axes semantics are (num_layers, minibatch_size, hidden_dim)
        return (torch.zeros(1, 1, self.hidden_dim).to(device),
                torch.zeros(1, 1, self.hidden_dim).to(device))

    def forward(self, sentence):
        lstm_out, self.hidden = self.lstm(sentence.view(len(sentence), 1, -1), self.hidden)
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

EMBEDDING_DIM = len(training_data[0][0][0])
HIDDEN_DIM = 256

model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, 4)
model.to(device)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in tqdm(range(n_epochs)):
    for sentence, tag in tqdm(training_data):
        model.zero_grad()

        model.hidden = model.init_hidden()

        sentence_in = torch.tensor(sentence, dtype=torch.float).to(device)
        targets = torch.tensor([label_to_idx[tag]], dtype=torch.long).to(device)

        tag_scores = model(sentence_in)

        res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)
        # I THINK THIS IS WRONG???
        print(res)     # tensor([[-10.6328, -10.6783, -10.6667,  -0.0001]], device='cuda:0', grad_fn=<CopyBackwards>)
        print(targets) # tensor([3], device='cuda:0')

        loss = loss_function(res, targets)

        loss.backward()
        optimizer.step()

The code is largely inspired by https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html The difference is that they have a sequence-to-sequence model and I have a sequence-to-ONE model. 该代码的主要灵感来自https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html 。不同之处在于它们具有序列到序列模型,而我具有序列到一个模型。

I am not sure what the problem is, but I guess that the scores returned by the model contain a score for each tag and my ground truth only contains the index of the correct class? 我不确定是什么问题,但是我想模型返回的分数包含每个标签的分数,而我的真实情况仅包含正确类别的索引? How would this be handled correctly? 如何正确处理?

Or is the loss function maybe not the correct one for my use case? 还是损失函数可能不是我的用例中正确的函数? Also I am not sure if this is done correctly: 我也不确定这样做是否正确:

res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)

By taking tag_scores[-1] I want to get the scores after the last word has been given to the network because tag_scores contains the scores after each step, if I understand correctly. 我想通过使用tag_scores[-1]来获得分数,因为如果我理解正确的话,tag_scores在每一步之后都会包含分数。

And this is how I evaluate: 这就是我的评估方式:

with torch.no_grad():
    preds = []
    gts = []

    for sentence, tag in tqdm(test_data):
        inputs = torch.tensor(sentence, dtype=torch.float).to(device)

        tag_scores = model(inputs)

        # find index with max value (this is the class to be predicted)
        pred = [j for j,v in enumerate(tag_scores[-1]) if v == max(tag_scores[-1])][0]

        print(pred, idx_to_label[pred], tag)
        preds.append(pred)
        gts.append(label_to_idx[tag])

print(f1_score(gts, preds, average='micro'))
print(classification_report(gts, preds))

EDIT : 编辑

When shuffling the data before training it seems to work. 在训练之前重新整理数据时,它似乎可以工作。 But why? 但为什么?

EDIT 2 : 编辑2

I think the reason why shuffling is needed is that my training data contains samples for each class in groups. 我认为需要改组的原因是我的训练数据包含分组中每个班级的样本。 So when training them each after the other, the model will only see the same class in the last N iterations and therefore it will only predict this class. 因此,当一个接一个地训练它们时,模型在最近的N次迭代中只会看到相同的类,因此它只会预测该类。 Another reason might also be that I am currently using mini-batches of only one sample because I haven't figured out yet how to use other sizes. 另一个原因可能是我目前仅使用一个样本的微型批次,因为我还没有弄清楚如何使用其他大小。

Because you are trying to use a whole sentence to classify, so the following line: 因为您正在尝试使用整个句子进行分类,所以下面这行:

self.hidden2tag(lstm_out.view(len(sentence), -1))

should be changed to, so it takes the final features to the classifier. 应该更改为,以便将最终特征带到分类器。

self.hidden2tag(lstm_out.view(sentence[-1], -1))

But I am also not so sure since I am not familiar with LSTM. 但是我也不是很确定,因为我不熟悉LSTM。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM