简体   繁体   English

在 Keras 上将注意力层与解码器输入 seq2seq 模型连接起来

[英]Concatening an attention layer with decoder input seq2seq model on Keras

I am trying to implement a sequence 2 sequence model with attention using the Keras library.我正在尝试使用 Keras 库实现带有注意力的序列 2 序列模型。 The block diagram of the model is as follows模型框图如下

在此处输入图片说明

The model embeds the input sequence into 3D tensors.该模型将输入序列嵌入到 3D 张量中。 Then a bidirectional lstm creates the encoding layer.然后双向 lstm 创建编码层。 Next the encoded sequences are sent to a custom attention layer that returns a 2D tensor having attention weights for each hidden node.接下来,编码序列被发送到自定义注意力层,该层返回具有每个隐藏节点注意力权重的 2D 张量。

The decoder input is injected in the model as one hot vector.解码器输入作为一个热向量注入模型中。 Now in the decoder (another bi-lstm) both decoder input and the attention weight are passed as input.现在在解码器(另一个 bi-lstm)中,解码器输入和注意力权重都作为输入传递。 The output of the decoder is sent to time distributed dense layer with the softmax activation function to get the output for every time step in the means of probability.解码器的输出通过 softmax 激活函数发送到时间分布密集层,以概率的方式得到每个时间步的输出。 The code of the model is as follows:该模型的代码如下:

encoder_input = Input(shape=(MAX_LENGTH_Input, ))

embedded = Embedding(input_dim=vocab_size_input, output_dim= embedding_width, trainable=False)(encoder_input)

encoder = Bidirectional(LSTM(units= hidden_size, input_shape=(MAX_LENGTH_Input,embedding_width), return_sequences=True, dropout=0.25, recurrent_dropout=0.25))(embedded)

attention = Attention(MAX_LENGTH_Input)(encoder)

decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))

merge = concatenate([attention, decoder_input])

decoder = Bidirectional(LSTM(units=hidden_size, input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge))

output = TimeDistributed(Dense(MAX_LENGTH_Output, activation="softmax"))(decoder)

The problem is when I am concatenating the attention layer and decoder input.问题是当我连接注意力层和解码器输入时。 Since the decoder input is a 3D tensor whereas attention is a 2D tensor, it's showing the following error:由于解码器输入是 3D 张量而注意力是 2D 张量,因此显示以下错误:

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. ValueError: Concatenate层需要具有匹配形状( Concatenate轴除外)的输入。 Got inputs shapes: [(None, 1024), (None, 10, 8281)]得到输入形状:[(None, 1024), (None, 10, 8281)]

How can I convert a 2D Attention tensor into a 3D tensor?如何将 2D 注意张量转换为 3D 张量?

Based on your block diagram it looks like you pass the same attention vector at every timestep to the decoder.根据您的框图,您似乎在每个时间步都将相同的注意力向量传递给解码器。 In that case you need to RepeatVector to copy the same attention vector at every timestep to convert a 2D attention tensor into a 3D tensor:在这种情况下,您需要使用RepeatVector在每个时间步复制相同的注意力向量,以将 2D 注意力张量转换为 3D 张量:

# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...

Take note that this will repeat the same attention vector for every timestep.请注意,这将为每个时间步重复相同的注意力向量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM