简体   繁体   English

将注意力层添加到 Seq2Seq model

[英]Add attention layer to Seq2Seq model

I have build a Seq2Seq model of encoder-decoder.我已经构建了一个编码器-解码器的 Seq2Seq model。 I want to add an attention layer to it.我想给它添加一个注意力层。 I tried adding attention layer through this but it didn't help.我尝试通过这个添加注意力层,但它没有帮助。

Here is my initial code without attention这是我没有注意的初始代码

# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

And this is the code after I added attention layer in decoder (the encoder layer is same as in initial code)这是我在解码器中添加注意力层后的代码(编码器层与初始代码相同)

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
attention = dot([decoder_lstm, encoder_lstm], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_lstm], axes=[2,1])
decoder_combined_context = concatenate([context, decoder_lstm])
decoder_outputs, _, _ = decoder_combined_context(dec_emb,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

While doing this, I got an error执行此操作时出现错误

 Layer dot_1 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.recurrent.LSTM'>. Full input: [<keras.layers.recurrent.LSTM object at 0x7f8f77e2f3c8>, <keras.layers.recurrent.LSTM object at 0x7f8f770beb70>]. All inputs to the layer should be tensors.

Can someone please help in fitting an attention layer in this architecture?有人可以帮忙在这个架构中安装注意力层吗?

the dot products need to be computed on tensor outputs... in encoder you correctly define the encoder_output, in decoder you have to add decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)需要在张量输出上计算点积...在编码器中您正确定义了编码器输出,在解码器中您必须添加解码器decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)

the dot products now are现在的点积是

attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])

the concatenation doesn't need initial_states.串联不需要initial_states。 you have to define it in your rnn layer: decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)您必须在您的 rnn 层中定义它: decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)

here the full example这里是完整的例子

ENCODER + DECODER编码器 + 解码器

# dummy variables
num_encoder_tokens = 30
num_decoder_tokens = 10
latent_dim = 100

encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

DECODER w\ ATTENTION解码器 w\ 注意

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = decoder_lstm(dec_emb, initial_state=encoder_states)
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])
decoder_outputs = concatenate([context, decoder_outputs])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
model.summary()

Marco's answer from above works, but one has to change the lines that involve the dot function in the second chunk. Marco 的上述回答有效,但必须更改第二块中涉及dot function 的行。 It takes one positional argument as in tensorflow 's example here .它采用一个位置参数,如tensorflow的示例中 所示
Finally, the chunk bellow includes the correction and will work:最后,下面的块包括更正并将工作:

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = decoder_lstm(dec_emb, initial_state=encoder_states)
attention = Dot(axes=[2, 2])([decoder_outputs, encoder_outputs])
attention = Activation('softmax')(attention)
context = Dot(axes=[2,1])([attention, encoder_outputs])
decoder_outputs = concatenate([context, decoder_outputs])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
model.summary()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow:使用注意和 BeamSearch 的 seq2seq 模型中的 .clone() 问题 - Tensorflow: Troubles with .clone() in seq2seq model using Attention and BeamSearch Tensorflow seq2seq回归模型 - Tensorflow seq2seq regression model tensorflow seq2seq model 输出相同的 output - tensorflow seq2seq model outputting the same output 嵌入rnn seq2seq和基本rnn seq2seq - embedding rnn seq2seq and basic rnn seq2seq 保存并加载自定义 Tensorflow Model(自回归 seq2seq 多元时间序列 GRU/RNN) - Save and Load Custom Tensorflow Model (Autoregressive seq2seq multivariate time series GRU/RNN) 如果我使用嵌入层,如何解码 seq-to-seq 模型的输出? - How do I decode the output of my seq-to-seq model if I'm using an embedding layer? Seq2seq用于非句子,浮点数据; 卡住配置解码器 - Seq2seq for non-sentence, float data; stuck configuring the decoder TypeError:无法在Seq2Seq中腌制_thread.lock对象 - TypeError: can't pickle _thread.lock objects in Seq2Seq InvalidArgumentError:logits 和 labels 必须具有相同的第一维 seq2seq Tensorflow - InvalidArgumentError: logits and labels must have the same first dimension seq2seq Tensorflow keras lstm seq2seq示例Windows上无法理解的关键字参数return_state - keras lstm seq2seq example Keyword argument not understood return_state on windows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM