简体   繁体   English

过拟合 LSTM pytorch

[英]Overfitting LSTM pytorch

I was following the tutorial on CoderzColumn to implement a LSTM for text classification using pytorch.我正在按照 CoderzColumn 上的教程使用pytorch实现用于文本分类的 LSTM。 I tried to apply the implementation on the bbc-news Dataset from Kaggle , however, it heavily overfits, achieving a max accuracy of about 60%.我尝试在Kaggle 的 bbc-news 数据集上应用该实现,但是,它严重过度拟合,达到了大约 60% 的最大准确率。

See the train/loss curve for example:参见训练/损失曲线,例如:

在此处输入图像描述

Is there any advice (I am quite new to RNN/LSTM), to adapt the model to prevent that high overfiting?是否有任何建议(我对 RNN/LSTM 很陌生)来调整 model 以防止高过拟合?

The model is taken from the above tutorial and looks kind of like this: model 取自上述教程,看起来像这样:

 class LSTMClassifier(nn.Module): def __init__(self, vocab, target_classes, embed_len = 50, hidden_dim=75, n_layers=1): super(LSTMClassifier, self).__init__() self.n_layers = n_layers self.embed_len = embed_len self.hidden_dim = hidden_dim self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embed_len) # self.lstm = nn.LSTM(input_size=embed_len, hidden_size=hidden_dim,dropout=0.2, num_layers=n_layers, batch_first=True) self.lstm = nn.LSTM(input_size=embed_len, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True) self.fc = nn.Linear(hidden_dim, len(target_classes)) def forward(self, X_batch): embeddings = self.embedding_layer(X_batch) hidden, carry = torch.randn(self.n_layers, len(X_batch), self.hidden_dim), torch.randn(self.n_layers, len(X_batch), self.hidden_dim) output, (hidden, carry) = self.lstm(embeddings, (hidden, carry)) return self.fc(output[:,-1])

I would be really thankful for any adive how to adapt the version in the tutorial to use it more effectively on other datasets我将非常感谢任何如何调整教程中的版本以更有效地在其他数据集上使用它

Have you tried adding nn.Dropout layer before the self.fc ?您是否尝试过在self.fc之前添加nn.Dropout层? Check what p = 0.1 / 0.2 / 0.3 will do.检查p = 0.1 / 0.2 / 0.3会做什么。

Another thing you can do is to add regularisation to your training via weight_decay parameter:您可以做的另一件事是通过weight_decay参数为您的训练添加正则化:

 optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-5)

Use small values first, and increase by 10 times, see which will get you the best result.先用小值,再增加10倍,看哪个效果最好。

Also, goes without saying, make sure that there is no test data points in train set.另外,不用说,确保训练集中没有测试数据点。 Make sure you did not forget to shuffle your train set:确保你没有忘记洗牌你的火车:

 train_loader = DataLoader(train_dataset, batch_size=1024, collate_fn=vectorize_batch, shuffle=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM