简体   繁体   English

Pytorch LSTM grad 仅在最后一个输出上

[英]Pytorch LSTM grad only on last output

I'm working with sequences of different lengths.我正在处理不同长度的序列。 But I would only want to grad them based on the output computed at the end of the sequence.但我只想根据序列末尾计算的输出对它们进行分级。

The samples are ordered so that they are decreasing in length and they are zero-padded.样本被排序,以便它们的长度减少并且它们被零填充。 For 5 1D samples it looks like this (omitting width dimension for visibility):对于 5 个一维样本,它看起来像这样(为了可见性省略了宽度尺寸):

array([[5, 7, 7, 4, 5, 8, 6, 9, 7, 9],
       [6, 4, 2, 2, 6, 5, 4, 2, 2, 0],
       [4, 6, 2, 4, 5, 1, 3, 1, 0, 0],
       [8, 8, 3, 7, 7, 7, 9, 0, 0, 0],
       [3, 2, 7, 5, 7, 0, 0, 0, 0, 0]])

For the LSTM I'm using nn.utils.rnn.pack_padded_sequence with the individual sequence lengths:对于 LSTM,我使用nn.utils.rnn.pack_padded_sequence和单个序列长度:

x = nn.utils.rnn.pack_padded_sequence(x, [10, 9, 8, 7, 5], batch_first=True)

The initialization of LSTM in the Model constructor: LSTM在Model构造函数中的初始化:

self.lstm = nn.LSTM(width, n_hidden, 2)

Then I call the LSTM and unpack the values:然后我调用 LSTM 并解压缩值:

x, _ = self.lstm(x)
x = nn.utils.rnn.pad_packed_sequence(x1, batch_first=True)

Then I'm applying a fully connected layer and a softmax然后我应用一个全连接层和一个 softmax

x = x.contiguous()
x = x.view(-1, n_hidden)
x = self.linear(x)
x = x.reshape(batch_size, n_labels, 10) # 10 is the sample height
return F.softmax(x, dim=1)

This gives me an output of shape batch x n_labels x height (5x12x10).这给了我形状batch x n_labels x height (5x12x10) 的输出。

For each sample, I would only want to use a single score, for the last output batch x n_labels (5*12).对于每个样本,我只想对最后一个输出batch x n_labels (5*12) 使用单个分数。 My question is How can I achieve this?我的问题是我怎样才能做到这一点?

One idea is to apply tanh on the last hidden layer returned from the model but I'm not quite sure if that would give the same results.一个想法是在模型返回的最后一个隐藏层上应用tanh ,但我不太确定这是否会给出相同的结果。 Is it possible to efficiently extract the output computed at the end of the sequence eg using the same lengths sequence used for pack_padded_sequence ?是否可以有效地提取在序列末尾计算的输出,例如使用与pack_padded_sequence相同长度的序列?

As Neaabfi answered hidden[-1] is correct.由于 Neaabfi 回答hidden[-1]是正确的。 To be more specific to your question, as the docs wrote:更具体地说明您的问题,如文档所述

output, (h_n, c_n) = self.lstm(x_pack) # batch_first = True

# h_n is a vector of shape (num_layers * num_directions, batch, hidden_size)

In your case, you have a stack of 2 LSTM layers with only forward direction, then:在您的情况下,您有一个只有forward方向的 2 个 LSTM 层的堆栈,然后:

h_n shape is (num_layers, batch, hidden_size)

Probably, you may prefer the hidden state h_n of the last layer, then **here is what you should do:也许,你可能更喜欢最后一层的隐藏状态h_n ,那么 **这里是你应该做的:

output, (h_n, c_n) = self.lstm(x_pack)
h = h_n[-1] # h of shape (batch, hidden_size)
y = self.linear(h)

Here is the code which wraps any recurrent layer LSTM , RNN or GRU into DynamicRNN . 是将任何循环层LSTMRNNGRU包装到DynamicRNN DynamicRNN has a capacity of performing recurrent computations on sequences of varied lengths without any care about the order of lengths. DynamicRNN具有对不同长度的序列执行循环计算的能力,而无需关心长度的顺序。

You can access the last hidden layer as follows:您可以按如下方式访问最后一个隐藏层:

output, (hidden, cell) = self.lstm(x_pack)
y = self.linear(hidden[-1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM