如何为进行顺序语言建模的 LSTM 向量化损失

Question

So I have an assignment involving Language Modelling and I passed all the unit tests but my code is too slow to run.所以我有一个涉及语言建模的作业，我通过了所有的单元测试，但我的代码运行速度太慢。 I think it's because of the way I compute my loss.我认为这是因为我计算损失的方式。 The formula we're given is the following:我们给出的公式如下：

My naive implementation is the following:我天真的实现如下：

losses_batch_list = []
batch_size = log_probas.size(0)
for b in range(batch_size):

    seq_length = max([i for i, e in enumerate(mask[b,:]) if e != 0]) + 1

    loss_batch = 0
    for t in range(seq_length):
        for n in range(self.vocabulary_size):
            if targets[b, t] == n:
                loss_batch += log_probas[b, t, n].detach()

    loss_batch = - loss_batch / seq_length
    losses_batch_list.append(loss_batch)

loss = torch.tensor(np.mean(losses_batch_list))

return loss

But that loop runs for ever since the vocabulary size is the same approximately as GPT1 (~40 000) and the sequence length is up to 255 (something it is shorter because of padding, hence the mask).但是该循环一直运行，因为词汇量大小与 GPT1（约 40 000）大致相同，并且序列长度高达 255（由于填充而较短，因此是掩码）。 Does anyone have any tips on how to vectorize/speed this up?有没有人有关于如何矢量化/加速它的任何提示？ I know it's correct but I can't report any results with it... Thanks!我知道这是正确的，但我不能用它报告任何结果......谢谢！

Answer 1

B = batch_size
T = sequence_length (padded)
N = vocab_size

if type(mask_b) == torch.bool:
    mask = mask.view(-1) # (B, T) -> (B*T,)
else:
    mask = mask.bool().view(-1) # (B, T) -> (B*T,)
log_probas = log_probas.view(-1, N) # (B, T, N) -> (B*T, N)
targets = target.view(-1, 1) # (B, T) -> (B*T, 1)
loss = torch.gather(log_probas[mask], -1, target[mask]) # loss without padded tokens
loss = loss.mean()

如何为进行顺序语言建模的 LSTM 向量化损失

问题描述

1 个解决方案

解决方案1
1 2021-03-17 02:21:38

如何为进行顺序语言建模的 LSTM 向量化损失

问题描述

1 个解决方案

解决方案1 1 2021-03-17 02:21:38

解决方案1
1 2021-03-17 02:21:38