如何在 TensorFlow GRU 模型中添加注意力層？

Question

我使用 TensorFlow Functional API 創建了一個語言翻譯模型。

這是模型

# encoder 
encoder = tf.keras.Input(shape=(200, ))
enc_embd = tf.keras.layers.Embedding(vocab_train, embedding_dim)(encoder)
encoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_e, hidden_e = encoder_gru(enc_embd)

# decoder
decoder = tf.keras.Input(shape=(200, ))
dec_embd = tf.keras.layers.Embedding(vocab_label, embedding_dim)(decoder)
decoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_d, hidden_d = decoder_gru(dec_embd, initial_state = hidden_e)
final_output = tf.keras.layers.Dense(vocab_label, activation='softmax')
output_f = final_output(output_d)

我想問一下，如何在編碼器和解碼器之間添加完全連接的tf.keras.layers.Attention （注意力層）？

Answer 1

您可以在output_e和output_d之間使用Attention層。 下面是一個完整的示例，我們創建一個自動編碼器，為編碼器和解碼器構建模型，然后合並在一起。

定義參數和虛擬數據：

vocab_train = 111
vocab_label = 123
embedding_dim = 64
units = 32
n_sample = 10
seq_length = 200

X_enc = np.random.randint(0,vocab_train, (n_sample,seq_length))
X_dec = np.random.randint(0,vocab_label, (n_sample,seq_length))
y = np.random.randint(0,2, (n_sample,seq_length,vocab_label))

定義編碼器（它還必須返回hidden_e因為它被解碼器使用）：

encoder = tf.keras.Input(shape=(seq_length, ))
enc_embd = tf.keras.layers.Embedding(vocab_train, embedding_dim)(encoder)
encoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_e, hidden_e = encoder_gru(enc_embd)

enc = Model(encoder, [hidden_e, output_e])

使用Attention定義解碼器（它也接收作為輸入的output_e和hidden_e ）：

decoder = tf.keras.Input(shape=(seq_length, ))
hidden_e_input = tf.keras.Input(shape=(units, ))
output_e_input = tf.keras.Input(shape=(seq_length, units))
dec_embd = tf.keras.layers.Embedding(vocab_label, embedding_dim)(decoder)
decoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_d, hidden_d = decoder_gru(dec_embd, initial_state = hidden_e_input)
att = tf.keras.layers.Attention()([output_e_input, output_d])
concat = tf.keras.layers.Concatenate()([att, output_d])
final_output = tf.keras.layers.Dense(vocab_label, activation='softmax')(concat)

dec = Model([decoder, hidden_e_input, output_e_input], final_output)

結合編碼器和解碼器：

inp_e = tf.keras.Input(shape=(seq_length, ))
h_e, o_e = enc(inp_e)
inp_d = tf.keras.Input(shape=(seq_length, ))
out = dec([inp_d, h_e, o_e])

ae = Model([inp_e, inp_d], out)
ae.compile('adam', 'categorical_crossentropy')
ae.fit([X_enc, X_dec], y, epochs=3)

如何在 TensorFlow GRU 模型中添加注意力層？

問題描述

1 個解決方案

解決方案1
1 2021-07-07 07:34:16

如何在 TensorFlow GRU 模型中添加注意力層？

問題描述

1 個解決方案

解決方案1 1 2021-07-07 07:34:16

解決方案1
1 2021-07-07 07:34:16