[英]LSTM always predicts the same class
I'm trying to solve an nlp classification problem with a LSTM.我正在尝试使用 LSTM 解决 nlp 分类问题。 The code for the model is defined here:
model 的代码在这里定义:
class LSTM(nn.Module):
def __init__(self, hidden_size, embedding_size=66 ):
super().__init__()
self.lstm = nn.LSTM(embedding_size, hidden_size, batch_first = True, bidirectional = True)
self.fc = nn.Linear(2*hidden_size,2)
def forward(self, input_seq):
output, (hidden_state, cell_state) = self.lstm(input_seq)
hidden_state = torch.cat((hidden_state[-1,:], hidden_state[-2,:]), -1)
logits = self.fc(hidden_state)
return nn.LogSoftmax(dim=1)(logits)
And the function I'm using to train this model is here:我用来训练这个 model 的 function 在这里:
def train_loop(dataloader, model, loss_fn, optimizer):
loss_fn = loss_fn
size = len(dataloader.dataset)
model.train()
zeros = 0
for batch, (X, y) in enumerate(dataloader):
# Transform string into tensor
tensor = torch.zeros(1,len(X[0]),66)
for i in range(len(X[0])):
tensor[0][i][ctoi[X[0][i]]] = 1
pred = model(tensor)
target = torch.zeros(2, dtype=torch.long)
target[y] = 1
if batch % 100 == 0:
print(pred.squeeze(), target)
loss = loss_fn(pred.squeeze(), target)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if pred.squeeze().argmax() == 0:
zeros += 1
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
print(f'In trainning predicted {zeros} zeroes out of {size} samples')
The X's are still strings, that's why I need to convert them to tensors before running it through the model. X 仍然是字符串,这就是为什么我需要在通过 model 运行之前将它们转换为张量。 The y's are either a 0 or 1 (since its a binary classification problem), that I need to convert to a tensor of shape (2,) to run through the loss function.
y 是 0 或 1(因为它是一个二元分类问题),我需要将其转换为形状为 (2,) 的张量以通过损失 function。
For some reason I keep getting the same class predicted for every input.出于某种原因,我不断得到为每个输入预测的相同 class。 The classes are not even that unbalanced (~45% to 55%), and I've tried changing the weights of the classes in the loss function with no improvements, it either converges to predicting always a 0 or always a 1. Most of the time it it converges to predicting always a 0, which makes even less sense because what happens usually is that the class 0 has less samples than class 1.
这些类甚至没有那么不平衡(~45% 到 55%),我已经尝试在损失 function 中更改类的权重,但没有任何改进,它要么收敛于预测总是 0 要么总是 1。大部分它收敛到始终预测为 0 的时间,这更没有意义,因为通常发生的情况是 class 0 的样本少于 class 1 的样本。
Since you're training a binary classification model, your output dim should be 1 (corresponding to a single probability P(y|x)).由于您正在训练二进制分类 model,因此您的 output 暗淡应该为 1(对应于单个概率 P(y|x))。 This means that the
y
you're retrieving from your dataloader should be the y used in your loss function (assuming a cross-entropy loss).这意味着您从数据加载器中检索的
y
应该是损失 function 中使用的 y(假设交叉熵损失)。 The predicted class is therefore y_hat = round(pred)
(ie, is the prediction >= 0.5).因此,预测的 class 为
y_hat = round(pred)
(即预测值 >= 0.5)。
As a point of clarity, it would be much easier to follow your logic if the one-hot encoding happened within your dataset (either in __getitem__
or __iter__
).为了清楚起见,如果 one-hot 编码发生在您的数据集中(在
__getitem__
或__iter__
中),则遵循您的逻辑会容易得多。 It's also worth noting that you don't use embeddings, so the code of your classifier is a bit misleading.还值得注意的是,您不使用嵌入,因此分类器的代码有点误导。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.