如何使用 keras 構建注意力模型？

Question

我正在嘗試理解注意力模型並自己構建一個。 經過多次搜索，我發現了這個網站，它有一個用 keras 編碼的注意力模型，而且看起來也很簡單。 但是當我試圖在我的機器上構建相同的模型時，它給出了多個參數錯誤。 該錯誤是由於在類Attention傳遞的參數不匹配。 在網站的注意力類中，它要求一個參數，但它用兩個參數啟動注意力對象。

import tensorflow as tf

max_len = 200
rnn_cell_size = 128
vocab_size=250

class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    def call(self, features, hidden):
        hidden_with_time_axis = tf.expand_dims(hidden, 1)
        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
        attention_weights = tf.nn.softmax(self.V(score), axis=1)
        context_vector = attention_weights * features
        context_vector = tf.reduce_sum(context_vector, axis=1)
        return context_vector, attention_weights

sequence_input = tf.keras.layers.Input(shape=(max_len,), dtype='int32')

embedded_sequences = tf.keras.layers.Embedding(vocab_size, 128, input_length=max_len)(sequence_input)

lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM
                                     (rnn_cell_size,
                                      dropout=0.3,
                                      return_sequences=True,
                                      return_state=True,
                                      recurrent_activation='relu',
                                      recurrent_initializer='glorot_uniform'), name="bi_lstm_0")(embedded_sequences)

lstm, forward_h, forward_c, backward_h, backward_c = tf.keras.layers.Bidirectional \
    (tf.keras.layers.LSTM
     (rnn_cell_size,
      dropout=0.2,
      return_sequences=True,
      return_state=True,
      recurrent_activation='relu',
      recurrent_initializer='glorot_uniform'))(lstm)

state_h = tf.keras.layers.Concatenate()([forward_h, backward_h])
state_c = tf.keras.layers.Concatenate()([forward_c, backward_c])

#  PROBLEM IN THIS LINE
context_vector, attention_weights = Attention(lstm, state_h)

output = keras.layers.Dense(1, activation='sigmoid')(context_vector)

model = keras.Model(inputs=sequence_input, outputs=output)

# summarize layers
print(model.summary())

我怎樣才能使這個模型工作？

Answer 1

初始化attention layer和傳遞參數的方式有問題。 你應該在這個地方指定attention layer單元的數量並修改傳入參數的方式：

context_vector, attention_weights = Attention(32)(lstm, state_h)

結果：

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 200)          0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 200, 128)     32000       input_1[0][0]                    
__________________________________________________________________________________________________
bi_lstm_0 (Bidirectional)       [(None, 200, 256), ( 263168      embedding[0][0]                  
__________________________________________________________________________________________________
bidirectional (Bidirectional)   [(None, 200, 256), ( 394240      bi_lstm_0[0][0]                  
                                                                 bi_lstm_0[0][1]                  
                                                                 bi_lstm_0[0][2]                  
                                                                 bi_lstm_0[0][3]                  
                                                                 bi_lstm_0[0][4]                  
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 256)          0           bidirectional[0][1]              
                                                                 bidirectional[0][3]              
__________________________________________________________________________________________________
attention (Attention)           [(None, 256), (None, 16481       bidirectional[0][0]              
                                                                 concatenate[0][0]                
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 1)            257         attention[0][0]                  
==================================================================================================
Total params: 706,146
Trainable params: 706,146
Non-trainable params: 0
__________________________________________________________________________________________________
None

Answer 2

注意層現在是 Tensorflow(2.1) 的 Keras API 的一部分。 但它輸出與您的“查詢”張量相同大小的張量。

這是如何使用 Luong-style attention：

query_attention = tf.keras.layers.Attention()([query, value])

和 Bahdanau 式的關注：

query_attention = tf.keras.layers.AdditiveAttention()([query, value])

改編版本：

attention_weights = tf.keras.layers.Attention()([lstm, state_h])

查看原始網站了解更多信息： https : //www.tensorflow.org/api_docs/python/tf/keras/layers/Attention https://www.tensorflow.org/api_docs/python/tf/keras/layers/添加注意力

Answer 3

為了回答 Arman 的特定查詢 - 這些庫使用了 2018 年后的查詢、值和鍵語義。 要將語義映射回 Bahdanau 或 Luong 的論文，您可以將“查詢”視為最后一個解碼器隱藏狀態。 “值”將是編碼器輸出的集合 - 編碼器的所有隱藏狀態。 “查詢”“參與”所有“值”。

無論您使用的是哪個版本的代碼或庫，請始終注意“查詢”將在時間軸上展開，以便為隨后的添加做好准備。 這個值（正在擴展）將始終是 RNN 的最后一個隱藏狀態。 另一個值將始終是需要注意的值 - 編碼器端的所有隱藏狀態。 無論您使用什么庫或代碼，都可以對代碼進行這種簡單的檢查，以確定“查詢”和“值”映射到什么。

可以參考https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e用不到 6 行代碼編寫自己的自定義注意力層

如何使用 keras 構建注意力模型？

問題描述

3 個解決方案

解決方案1
9 已采納 2019-07-09 11:35:52

解決方案2
9 2020-02-29 20:32:33

解決方案3
3 2020-11-19 08:32:38

如何使用 keras 構建注意力模型？

問題描述

3 個解決方案

解決方案1 9 已采納 2019-07-09 11:35:52

解決方案2 9 2020-02-29 20:32:33

解決方案3 3 2020-11-19 08:32:38

解決方案1
9 已采納 2019-07-09 11:35:52

解決方案2
9 2020-02-29 20:32:33

解決方案3
3 2020-11-19 08:32:38