[英]Add attention layer to Seq2Seq model
I have build a Seq2Seq model of encoder-decoder.我已经构建了一个编码器-解码器的 Seq2Seq model。 I want to add an attention layer to it.我想给它添加一个注意力层。 I tried adding attention layer through this but it didn't help.我尝试通过这个添加注意力层,但它没有帮助。
Here is my initial code without attention这是我没有注意的初始代码
# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb = Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
And this is the code after I added attention layer in decoder (the encoder layer is same as in initial code)这是我在解码器中添加注意力层后的代码(编码器层与初始代码相同)
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
attention = dot([decoder_lstm, encoder_lstm], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_lstm], axes=[2,1])
decoder_combined_context = concatenate([context, decoder_lstm])
decoder_outputs, _, _ = decoder_combined_context(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
While doing this, I got an error执行此操作时出现错误
Layer dot_1 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.recurrent.LSTM'>. Full input: [<keras.layers.recurrent.LSTM object at 0x7f8f77e2f3c8>, <keras.layers.recurrent.LSTM object at 0x7f8f770beb70>]. All inputs to the layer should be tensors.
Can someone please help in fitting an attention layer in this architecture?有人可以帮忙在这个架构中安装注意力层吗?
the dot products need to be computed on tensor outputs... in encoder you correctly define the encoder_output, in decoder you have to add decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)
需要在张量输出上计算点积...在编码器中您正确定义了编码器输出,在解码器中您必须添加解码器decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)
the dot products now are现在的点积是
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])
the concatenation doesn't need initial_states.串联不需要initial_states。 you have to define it in your rnn layer: decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)
您必须在您的 rnn 层中定义它: decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)
here the full example这里是完整的例子
ENCODER + DECODER编码器 + 解码器
# dummy variables
num_encoder_tokens = 30
num_decoder_tokens = 10
latent_dim = 100
encoder_inputs = Input(shape=(None,))
enc_emb = Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
DECODER w\ ATTENTION解码器 w\ 注意
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = decoder_lstm(dec_emb, initial_state=encoder_states)
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])
decoder_outputs = concatenate([context, decoder_outputs])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
model.summary()
Marco's answer from above works, but one has to change the lines that involve the dot
function in the second chunk. Marco 的上述回答有效,但必须更改第二块中涉及dot
function 的行。 It takes one positional argument as in tensorflow
's example here .它采用一个位置参数,如tensorflow
的示例中 所示。
Finally, the chunk bellow includes the correction and will work:最后,下面的块包括更正并将工作:
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = decoder_lstm(dec_emb, initial_state=encoder_states)
attention = Dot(axes=[2, 2])([decoder_outputs, encoder_outputs])
attention = Activation('softmax')(attention)
context = Dot(axes=[2,1])([attention, encoder_outputs])
decoder_outputs = concatenate([context, decoder_outputs])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
model.summary()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.