简体   繁体   English

Pytorch:LSTM 不学习

[英]Pytorch: LSTM does not learn

I'm having a hard time training my LSTM model, it does not seem to learn at all.我很难训练我的 LSTM model,它似乎根本没有学习。 The training loss is hardly decreasing and accuracy changes for very simple models (1 layer, few lstm units) but eventually gets stuck at 45%, just like the more complex models right from the start.对于非常简单的模型(1 层,几个 lstm 单元),训练损失几乎没有减少,准确度发生变化,但最终卡在 45%,就像从一开始就更复杂的模型一样。 Most of the times, it only predicts one class as output.大多数时候,它只预测一个 class 为 output。 I've tried varying all the hyper parameters, but it doesn't really seem to change anything so I'm afraid I'm missing something obvous.我试过改变所有的超参数,但它似乎并没有真正改变任何东西,所以我担心我错过了一些明显的东西。

Here is my model:这是我的 model:

Embedding_Model(nn.Module):
    def __init__(self, dim, vocab_size, classes, lstm_units=100, num_layers=3, bidirectional=True):
        super(Embedding_Model, self).__init__()

        self.word_embeddings = nn.Embedding(vocab_size, dim)
        self.lstm =  nn.LSTM(dim, lstm_units, num_layers=num_layers, bidirectional=bidirectional, batch_first=True)
        self.fc = nn.Sequential(
            nn.Linear(lstm_units*(1+int(bidirectional)), classes),
            nn.Softmax(dim=1)
        )

    def forward(self, x):
        x = self.word_embeddings(x)
        x, (h,c)= self.lstm(x)
        x = x.transpose(0,1)
        x = self.fc(x[-1])
        return x

My training set contains roughly 5000 input sequences.我的训练集包含大约 5000 个输入序列。 The length of the input sequences is 1400 (padded with zeros).输入序列的长度为 1400(用零填充)。 There are 150,000 different tokens, I've tried embedding dimensions between 10 and 200. I use 3 output classes (quite balanced in the training set) and cross entropy loss.有 150,000 个不同的标记,我尝试嵌入 10 到 200 之间的维度。我使用 3 个 output 类(在训练集中相当平衡)和交叉熵损失。

Did I mess up anything obvious?我搞砸了什么明显的东西吗? I understand that the training set is rather small, but at least I would expect some overfitting.我知道训练集相当小,但至少我预计会有一些过度拟合。 But the model does not seem to learn anything at all.但是 model 似乎根本没有学到任何东西。

You are saying that you use nn.CrossEntropyLoss as the loss function, which applies log-softmax, but you are also applying nn.Softmax in the model.您是说您使用nn.CrossEntropyLoss作为损失 function,它应用 log-softmax,但您也在nn.Softmax中应用 nn.Softmax。

nn.CrossEntropyLoss expects the raw logits, so you need to remove the nn.Softmax from your model. nn.CrossEntropyLoss需要原始 logits,因此您需要从 model 中删除nn.Softmax

self.fc = nn.Linear(lstm_units*(1+int(bidirectional)), classes)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM