简体   繁体   English

如何使 LSTMClassifier 双向?

[英]How to make LSTMClassifier Bidirectional?

Goal: make LSTM self.classifier() learn from bidirectional layers.目标:让 LSTM self.classifier()从双向层中学习。

# ! = line of interest = 兴趣线

Question: What changes to LSTMClassifier do I need to make, in order to have this LSTM work bidirectionally?问题:为了让这个 LSTM 双向工作,我需要对LSTMClassifier进行哪些更改?


When passing bidirectional=True to self.lstm = nn.LSTM(...) , I get Traceback:将 bidirectional bidirectional=True传递给self.lstm = nn.LSTM(...)时,我得到 Traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-51-b94d572a1b68> in <module>()
     11     """.split()
     12 
---> 13 run_training(args)

3 frames
<ipython-input-8-bb0d8b014e32> in run_training(input)
     54     elif args.checkpointfile:
     55         file_path = os.path.join(args.traindir, args.checkpointfile)
---> 56         model = LSTMTaggerModel.load_from_checkpoint(file_path)
     57     else:
     58         model = LSTMTaggerModel(**vars(args), num_classes=dm.num_classes, class_map=dm.class_map)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/saving.py in load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
    155         checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)
    156 
--> 157         model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
    158         return model
    159 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/saving.py in _load_model_state(cls, checkpoint, strict, **cls_kwargs_new)
    203 
    204         # load the state_dict on the model automatically
--> 205         model.load_state_dict(checkpoint['state_dict'], strict=strict)
    206 
    207         return model

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
   1405         if len(error_msgs) > 0:
   1406             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1407                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1408         return _IncompatibleKeys(missing_keys, unexpected_keys)
   1409 

RuntimeError: Error(s) in loading state_dict for LSTMTaggerModel:
    Missing key(s) in state_dict: "model.lstm.weight_ih_l0_reverse", "model.lstm.weight_hh_l0_reverse", "model.lstm.bias_ih_l0_reverse", "model.lstm.bias_hh_l0_reverse".

I think the problem is with forward() .认为问题出在forward()上。 It learns from the last state of the LSTM neural network, by slicing:它通过切片从 LSTM 神经网络的最后一个 state学习:

tag_space = self.classifier(lstm_out[:,-1,:])

However, bidirectional changes the architecture and thus the output shape.然而,双向改变了架构,从而改变了 output 的形状。

Do I need to sum up or concatenate the values of the 2 layers/ directions?我需要总结或连接 2 层/方向的值吗?


Working Code:工作代码:

from argparse import ArgumentParser

import torchmetrics
import pytorch_lightning as pl
import torch
import torch.nn as nn
import torch.nn.functional as F

class LSTMClassifier(nn.Module):

    def __init__(self, 
        num_classes, 
        batch_size=10,
        embedding_dim=100, 
        hidden_dim=50, 
        vocab_size=128):

        super(LSTMClassifier, self).__init__()

        initrange = 0.1

        self.num_labels = num_classes
        n = len(self.num_labels)
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size

        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.word_embeddings.weight.data.uniform_(-initrange, initrange)
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)  # !
        self.classifier = nn.Linear(hidden_dim, self.num_labels[0])


    def repackage_hidden(h):
        """Wraps hidden states in new Tensors, to detach them from their history."""

        if isinstance(h, torch.Tensor):
            return h.detach()
        else:
            return tuple(repackage_hidden(v) for v in h)


    def forward(self, sentence, labels=None):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds)
        tag_space = self.classifier(lstm_out[:,-1,:])  # !
        logits = F.log_softmax(tag_space, dim=1)
        loss = None
        if labels:
            loss = F.cross_entropy(logits.view(-1, self.num_labels[0]), labels[0].view(-1))
        return loss, logits

It sounds like you're trying to load a pretrained model (which uses an unidirectional LSTM) into a model which has a bidirectional LSTM in its state dict.听起来您正在尝试将预训练的 model (使用单向 LSTM)加载到 model 中,该 model 在其 Z9ED39E2EA931586B6A985A6942EF573E dict. There are several things you can do here, as there are innate differences between your pretrained state dict and your bidirectional state dict:您可以在这里做几件事,因为预训练的 state 字典和双向 state 字典之间存在先天差异:

  1. Definitely use model.load_state_dict(model_params,strict=False) (see this link ).绝对使用model.load_state_dict(model_params,strict=False) (请参阅此链接)。 This will stop the complaining when you use a model that's different to what you're trying to learn.当您使用与您要学习的内容不同的 model 时,这将停止抱怨。 It means that your forward pass will be pretrained but not your backward pass.这意味着你的前传将被预训练,但你的后传不会。
  2. If you do this ^ you will need to sum or otherwise condense the final time steps for the forward and backward case because the classifier will then have a different shape otherwise.如果您这样做 ^ 您将需要求和或以其他方式压缩前向和后向情况的最终时间步长,因为否则分类器将具有不同的形状。 strict=False though will ignore this, so only do this if you care about having a pretrained first layer in your classifier. strict=False虽然会忽略这一点,所以只有在你关心在你的分类器中有一个预训练的第一层时才这样做。
  3. If you don't want to do the above two, you can copy the weights for model.lstm.weight_ih_l0_reverse and other missing parameters from the forward direction in the state dict, as it's just a python dictionary.如果你不想做以上两个,你可以从 state 字典中复制model.lstm.weight_ih_l0_reverse和其他缺失参数的权重,因为它只是一个 model.lstm.weight_ih_l0_reverse 和其他缺失参数,因为它只是一个 model.lstm.weight_ih_l0_reverse 和其他缺少的参数。 It is not ideal because obviously the forward and backward pass will learn different things, but will stop the error and be in a reasonably good initialisation space.这并不理想,因为显然向前和向后传递将学习不同的东西,但会停止错误并处于相当好的初始化空间中。 You will still have the same error in two though where your LSTM output is twice as big as it was.尽管您的 LSTM output 是原来的两倍,但您仍然会在两个中遇到相同的错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM