如何使用 keras 构建注意力模型？

Question

I am trying to understand attention model and also build one myself.我正在尝试理解注意力模型并自己构建一个。 After many searches I came across this website which had an atteniton model coded in keras and also looks simple.经过多次搜索，我发现了这个网站，它有一个用 keras 编码的注意力模型，而且看起来也很简单。 But when I tried to build that same model in my machine its giving multiple argument error.但是当我试图在我的机器上构建相同的模型时，它给出了多个参数错误。 The error was due to the mismatched argument passing in class Attention .该错误是由于在类Attention传递的参数不匹配。 In the website's attention class it's asking for one argument but it initiates the attention object with two arguments.在网站的注意力类中，它要求一个参数，但它用两个参数启动注意力对象。

import tensorflow as tf

max_len = 200
rnn_cell_size = 128
vocab_size=250

class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    def call(self, features, hidden):
        hidden_with_time_axis = tf.expand_dims(hidden, 1)
        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
        attention_weights = tf.nn.softmax(self.V(score), axis=1)
        context_vector = attention_weights * features
        context_vector = tf.reduce_sum(context_vector, axis=1)
        return context_vector, attention_weights

sequence_input = tf.keras.layers.Input(shape=(max_len,), dtype='int32')

embedded_sequences = tf.keras.layers.Embedding(vocab_size, 128, input_length=max_len)(sequence_input)

lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM
                                     (rnn_cell_size,
                                      dropout=0.3,
                                      return_sequences=True,
                                      return_state=True,
                                      recurrent_activation='relu',
                                      recurrent_initializer='glorot_uniform'), name="bi_lstm_0")(embedded_sequences)

lstm, forward_h, forward_c, backward_h, backward_c = tf.keras.layers.Bidirectional \
    (tf.keras.layers.LSTM
     (rnn_cell_size,
      dropout=0.2,
      return_sequences=True,
      return_state=True,
      recurrent_activation='relu',
      recurrent_initializer='glorot_uniform'))(lstm)

state_h = tf.keras.layers.Concatenate()([forward_h, backward_h])
state_c = tf.keras.layers.Concatenate()([forward_c, backward_c])

#  PROBLEM IN THIS LINE
context_vector, attention_weights = Attention(lstm, state_h)

output = keras.layers.Dense(1, activation='sigmoid')(context_vector)

model = keras.Model(inputs=sequence_input, outputs=output)

# summarize layers
print(model.summary())

How can I make this model work?我怎样才能使这个模型工作？

Answer 1

There is a problem with the way you initialize attention layer and pass parameters.初始化attention layer和传递参数的方式有问题。 You should specify the number of attention layer units in this place and modify the way of passing in parameters：你应该在这个地方指定attention layer单元的数量并修改传入参数的方式：

context_vector, attention_weights = Attention(32)(lstm, state_h)

The result:结果：

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 200)          0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 200, 128)     32000       input_1[0][0]                    
__________________________________________________________________________________________________
bi_lstm_0 (Bidirectional)       [(None, 200, 256), ( 263168      embedding[0][0]                  
__________________________________________________________________________________________________
bidirectional (Bidirectional)   [(None, 200, 256), ( 394240      bi_lstm_0[0][0]                  
                                                                 bi_lstm_0[0][1]                  
                                                                 bi_lstm_0[0][2]                  
                                                                 bi_lstm_0[0][3]                  
                                                                 bi_lstm_0[0][4]                  
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 256)          0           bidirectional[0][1]              
                                                                 bidirectional[0][3]              
__________________________________________________________________________________________________
attention (Attention)           [(None, 256), (None, 16481       bidirectional[0][0]              
                                                                 concatenate[0][0]                
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 1)            257         attention[0][0]                  
==================================================================================================
Total params: 706,146
Trainable params: 706,146
Non-trainable params: 0
__________________________________________________________________________________________________
None

Answer 2

Attention layers are part of Keras API of Tensorflow(2.1) now.注意层现在是 Tensorflow(2.1) 的 Keras API 的一部分。 But it outputs the same sized tensor as your "query" tensor.但它输出与您的“查询”张量相同大小的张量。

This is how to use Luong-style attention:这是如何使用 Luong-style attention：

query_attention = tf.keras.layers.Attention()([query, value])

And Bahdanau-style attention :和 Bahdanau 式的关注：

query_attention = tf.keras.layers.AdditiveAttention()([query, value])

The adapted version:改编版本：

attention_weights = tf.keras.layers.Attention()([lstm, state_h])

Check out the original website for more information: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention https://www.tensorflow.org/api_docs/python/tf/keras/layers/AdditiveAttention查看原始网站了解更多信息： https : //www.tensorflow.org/api_docs/python/tf/keras/layers/Attention https://www.tensorflow.org/api_docs/python/tf/keras/layers/添加注意力

Answer 3

To answer Arman's specific query - these libraries use post-2018 semantics of queries, values and keys.为了回答 Arman 的特定查询 - 这些库使用了 2018 年后的查询、值和键语义。 To map the semantics back to Bahdanau or Luong's paper, you can consider the 'query' to be the last decoder hidden state.要将语义映射回 Bahdanau 或 Luong 的论文，您可以将“查询”视为最后一个解码器隐藏状态。 The 'values' will be the set of the encoder outputs - all the hidden states of the encoder. “值”将是编码器输出的集合 - 编码器的所有隐藏状态。 The 'query' 'attends' to all the 'values'. “查询”“参与”所有“值”。

Whichever version of code or library you are using, always note that the 'query' will be expanded over the time axis to prepare it for the subsequent addition that follows.无论您使用的是哪个版本的代码或库，请始终注意“查询”将在时间轴上展开，以便为随后的添加做好准备。 This value (that is being expanded) will always be the last hidden state of the RNN.这个值（正在扩展）将始终是 RNN 的最后一个隐藏状态。 The other value will always be the values that need to be attended to - all the hidden states at the encoder end.另一个值将始终是需要注意的值 - 编码器端的所有隐藏状态。 This simple check of the code can be done to determine what 'query' and 'values' map to irrespective of the library or code that you are using.无论您使用什么库或代码，都可以对代码进行这种简单的检查，以确定“查询”和“值”映射到什么。

You can refer to https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e to write your own custom attention layer in less than 6 lines of code可以参考https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e用不到 6 行代码编写自己的自定义注意力层

如何使用 keras 构建注意力模型？

问题描述

3 个解决方案

解决方案1
9 已采纳 2019-07-09 11:35:52

解决方案2
9 2020-02-29 20:32:33

解决方案3
3 2020-11-19 08:32:38

如何使用 keras 构建注意力模型？

问题描述

3 个解决方案

解决方案1 9 已采纳 2019-07-09 11:35:52

解决方案2 9 2020-02-29 20:32:33

解决方案3 3 2020-11-19 08:32:38

解决方案1
9 已采纳 2019-07-09 11:35:52

解决方案2
9 2020-02-29 20:32:33

解决方案3
3 2020-11-19 08:32:38