使用 Keras 注意力連接序列 2 序列模型中的層形狀錯誤

Question

我正在嘗試使用 Colab 中的 Keras 實現一個簡單的單詞級序列到序列模型。 我正在使用 Keras 注意層。 這是模型的定義：

embedding_size=200
UNITS=128

encoder_inputs = Input(shape=(None,), name="encoder_inputs")

encoder_embs=Embedding(num_encoder_tokens, embedding_size, name="encoder_embs")(encoder_inputs)

#encoder lstm
encoder = LSTM(UNITS, return_state=True, name="encoder_LSTM") #(encoder_embs)
encoder_outputs, state_h, state_c = encoder(encoder_embs)

encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,), name="decoder_inputs")
decoder_embs = Embedding(num_decoder_tokens, embedding_size, name="decoder_embs")(decoder_inputs)

#decoder lstm
decoder_lstm = LSTM(UNITS, return_sequences=True, return_state=True, name="decoder_LSTM")
decoder_outputs, _, _ = decoder_lstm(decoder_embs, initial_state=encoder_states)

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])
decoder_outputs = TimeDistributed(Dense(units=num_decoder_tokens, 
                                  activation='softmax', name="decoder_denseoutput"))(decoder_concatenate)

model=Model([encoder_inputs, decoder_inputs], decoder_outputs, name="s2s_model")
model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

模型編譯很好，沒有任何問題。 編碼器和解碼器的輸入和輸出形狀為：

Encoder training input shape:  (4000, 21)
Decoder training input shape:  (4000, 12)
Decoder training target shape:  (4000, 12, 3106)
--
Encoder test input shape:  (385, 21)

這是 model.fit 代碼：

model.fit([encoder_training_input, decoder_training_input], decoder_training_target,
      epochs=100,
      batch_size=32,
      validation_split=0.2,)

當我運行 fit 階段時，我從 Concatenate 層收到此錯誤：

ValueError: Dimension 1 in both shapes must be equal, but are 12 and 32. 
Shapes are [32,12] and [32,32]. for '{{node s2s_model/concat_layer/concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](s2s_model/decoder_LSTM/PartitionedCall:1,
s2s_model/attention_layer/MatMul_1, s2s_model/concat_layer/concat/axis)' with input shapes: [32,12,128], [32,32,128], [] and with computed input tensors: input[2] = <2>.

因此，前 32 個是batch_size ，128 個是來自decoder_outputs和attention_out輸出形狀，12 個是解碼器輸入的標記數。 我不明白如何解決這個錯誤，我無法改變我認為的輸入令牌的數量，對我有什么建議嗎？

Answer 1

感謝@Majitsima 解決了這個問題。 我交換了注意力層的輸入，而不是

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

輸入是

attention=Attention(name="attention_layer")
attention_out=attention([decoder_outputs, encoder_outputs])

和

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])

現在一切似乎都正常了，再次感謝@Majitsima，希望這能有所幫助！

Answer 2

將連接層中的axis=-1 替換為axis=1。 本文檔中的示例應闡明原因。

您的問題在於傳遞給串聯的輸入。 您需要指定正確的軸來連接兩個不同形狀的矩陣或張量，因為它們在 Tensorflow 中被調用。 形狀 [32, 12, 128] 和 [32, 32, 128] 在通過傳遞 1 引用的第二個維度上有所不同（因為維度從 0 開始向上）。 這將導致形狀 [32, (12+32), 128]，增加第二維中的元素。

當您將軸指定為 -1（默認值）時，您的連接層在使用前基本上會展平輸入，在您的情況下，由於尺寸差異，這不起作用。

使用 Keras 注意力連接序列 2 序列模型中的層形狀錯誤

問題描述

2 個解決方案

解決方案1
1 2021-10-27 08:22:53

解決方案2
0 2021-10-25 12:49:16

使用 Keras 注意力連接序列 2 序列模型中的層形狀錯誤

問題描述

2 個解決方案

解決方案1 1 2021-10-27 08:22:53

解決方案2 0 2021-10-25 12:49:16

解決方案1
1 2021-10-27 08:22:53

解決方案2
0 2021-10-25 12:49:16