简体   繁体   English

LSTM 总是预测相同的 class

[英]LSTM always predicts the same class

I'm trying to solve an nlp classification problem with a LSTM.我正在尝试使用 LSTM 解决 nlp 分类问题。 The code for the model is defined here: model 的代码在这里定义:

class LSTM(nn.Module):

  def __init__(self, hidden_size, embedding_size=66 ):

      super().__init__()
      self.lstm = nn.LSTM(embedding_size, hidden_size, batch_first = True, bidirectional = True)
      self.fc = nn.Linear(2*hidden_size,2)

  def forward(self, input_seq):
      
      output, (hidden_state, cell_state) = self.lstm(input_seq)

      hidden_state = torch.cat((hidden_state[-1,:], hidden_state[-2,:]), -1)

      logits = self.fc(hidden_state)
      
      return nn.LogSoftmax(dim=1)(logits)

And the function I'm using to train this model is here:我用来训练这个 model 的 function 在这里:

def train_loop(dataloader, model, loss_fn, optimizer):
    
    loss_fn = loss_fn
    size = len(dataloader.dataset)
    model.train()
    zeros = 0
    for batch, (X, y) in enumerate(dataloader):

        # Transform string into tensor
        tensor = torch.zeros(1,len(X[0]),66)
        for i in range(len(X[0])):
            tensor[0][i][ctoi[X[0][i]]] = 1

        pred = model(tensor)

        target = torch.zeros(2, dtype=torch.long)
        target[y] = 1
        
        if batch % 100 == 0:
            print(pred.squeeze(), target)
        loss = loss_fn(pred.squeeze(), target)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if pred.squeeze().argmax() == 0:
            zeros += 1

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

    print(f'In trainning predicted {zeros} zeroes out of {size} samples')

The X's are still strings, that's why I need to convert them to tensors before running it through the model. X 仍然是字符串,这就是为什么我需要在通过 model 运行之前将它们转换为张量。 The y's are either a 0 or 1 (since its a binary classification problem), that I need to convert to a tensor of shape (2,) to run through the loss function. y 是 0 或 1(因为它是一个二元分类问题),我需要将其转换为形状为 (2,) 的张量以通过损失 function。

For some reason I keep getting the same class predicted for every input.出于某种原因,我不断得到为每个输入预测的相同 class。 The classes are not even that unbalanced (~45% to 55%), and I've tried changing the weights of the classes in the loss function with no improvements, it either converges to predicting always a 0 or always a 1. Most of the time it it converges to predicting always a 0, which makes even less sense because what happens usually is that the class 0 has less samples than class 1.这些类甚至没有那么不平衡(~45% 到 55%),我已经尝试在损失 function 中更改类的权重,但没有任何改进,它要么收敛于预测总是 0 要么总是 1。大部分它收敛到始终预测为 0 的时间,这更没有意义,因为通常发生的情况是 class 0 的样本少于 class 1 的样本。

Since you're training a binary classification model, your output dim should be 1 (corresponding to a single probability P(y|x)).由于您正在训练二进制分类 model,因此您的 output 暗淡应该为 1(对应于单个概率 P(y|x))。 This means that the y you're retrieving from your dataloader should be the y used in your loss function (assuming a cross-entropy loss).这意味着您从数据加载器中检索的y应该是损失 function 中使用的 y(假设交叉熵损失)。 The predicted class is therefore y_hat = round(pred) (ie, is the prediction >= 0.5).因此,预测的 class 为y_hat = round(pred) (即预测值 >= 0.5)。

As a point of clarity, it would be much easier to follow your logic if the one-hot encoding happened within your dataset (either in __getitem__ or __iter__ ).为了清楚起见,如果 one-hot 编码发生在您的数据集中(在__getitem____iter__中),则遵循您的逻辑会容易得多。 It's also worth noting that you don't use embeddings, so the code of your classifier is a bit misleading.还值得注意的是,您不使用嵌入,因此分类器的代码有点误导。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM