简体   繁体   English

Keras:如何在 LSTM 模型中显示注意力权重

[英]Keras: How to display attention weights in LSTM model

I made a text classification model using an LSTM with attention layer.我使用带有注意力层的 LSTM 制作了一个文本分类模型。 I did my model well, it works well, but I can't display the attention weights and the importance/attention of each word in a review (the input text).我的模型做得很好,效果很好,但是我无法显示评论(输入文本)中每个单词的注意力权重和重要性/注意力。 The code used for this model is:该模型使用的代码是:

def dot_product(x, kernel):
   if K.backend() == 'tensorflow':
       return K.squeeze(K.dot(x, K.expand_dims(kernel)),axis=-1)
   else:
       return K.dot(x, kernel)

class AttentionWithContext(Layer):
    """
Attention operation, with a context/query vector, for temporal data.

"Hierarchical Attention Networks for Document Classification"
by using a context vector to assist the attention
# Input shape
    3D tensor with shape: (samples, steps, features).
# Output shape
    2D tensor with shape: (samples, features).
How to use:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Note: The layer has been tested with Keras 2.0.6
Example:
    model.add(LSTM(64, return_sequences=True))
    model.add(AttentionWithContext())
    # next add a Dense layer (for classification/regression) or whatever
     """

def __init__(self,
             W_regularizer=None, u_regularizer=None, b_regularizer=None,
             W_constraint=None, u_constraint=None, b_constraint=None,
             bias=True, **kwargs):

    self.supports_masking = True
    self.init = initializers.get('glorot_uniform')

    self.W_regularizer = regularizers.get(W_regularizer)
    self.u_regularizer = regularizers.get(u_regularizer)
    self.b_regularizer = regularizers.get(b_regularizer)

    self.W_constraint = constraints.get(W_constraint)
    self.u_constraint = constraints.get(u_constraint)
    self.b_constraint = constraints.get(b_constraint)

    self.bias = bias
    super(AttentionWithContext, self).__init__(**kwargs)

def build(self, input_shape):
    assert len(input_shape) == 3

    self.W = self.add_weight((input_shape[-1], input_shape[-1],),
                             initializer=self.init,
                             name='{}_W'.format(self.name),
                             regularizer=self.W_regularizer,
                             constraint=self.W_constraint)
    if self.bias:
        self.b = self.add_weight((input_shape[-1],),
                                 initializer='zero',
                                 name='{}_b'.format(self.name),
                                 regularizer=self.b_regularizer,
                                 constraint=self.b_constraint)

    self.u = self.add_weight((input_shape[-1],),
                             initializer=self.init,
                             name='{}_u'.format(self.name),
                             regularizer=self.u_regularizer,
                             constraint=self.u_constraint)

    super(AttentionWithContext, self).build(input_shape)

def compute_mask(self, input, input_mask=None):
    # do not pass the mask to the next layers
    return None

def call(self, x, mask=None):
    uit = dot_product(x, self.W)

    if self.bias:
        uit += self.b

    uit = K.tanh(uit)
    ait = dot_product(uit, self.u)

    a = K.exp(ait)

    # apply mask after the exp. will be re-normalized next
    if mask is not None:
        # Cast the mask to floatX to avoid float64 upcasting in theano
        a *= K.cast(mask, K.floatx())

    # in some cases especially in the early stages of training the sum may be almost zero
    # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
    # a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
    a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

    a = K.expand_dims(a)
    weighted_input = x * a
    return K.sum(weighted_input, axis=1)

def compute_output_shape(self, input_shape):
    return input_shape[0], input_shape[-1]


EMBEDDING_DIM=100
max_seq_len=118
bach_size = 256
num_epochs=50
from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, Dropout


def BidLstm():
    #inp = Input(shape=(118,100))
    #x = Embedding(max_features, embed_size, weights=[embedding_matrix],
              #trainable=False)(inp)
     model1=Sequential()
     model1.add(Dense(512,input_shape=(118,100)))
    model1.add(Activation('relu'))
    #model1.add(Flatten()) 
    #model1.add(BatchNormalization(input_shape=(100,)))
    model1.add(Bidirectional(LSTM(100, activation="relu",return_sequences=True)))
    model1.add(Dropout(0.1))
    model1.add(TimeDistributed(Dense(200)))
    model1.add(AttentionWithContext())
    model1.add(Dropout(0.25))
    model1.add(Dense(4, activation="softmax"))
    model1.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
              metrics=['accuracy'])
    model1.summary()
    return model1

Please see the github repository here: https://github.com/FlorisHoogenboom/keras-han-for-docla请在此处查看 github 存储库: https : //github.com/FlorisHoogenboom/keras-han-for-docla

first define the weights computation in attention layer explicitly second to extract the previous layers output and attention layer weights and then multiply it as the word attentive weights首先明确定义注意力层中的权重计算,然后提取前一层输出和注意力层权重,然后将其乘以注意力权重这个词

After reading the above comprehensive answers, I finally understand how to extract the weights for attention layers.看完以上综合答案,我终于明白了如何提取attention layer的权重。 Overall, the ideas of @Li Xiang and @Okorimi Manoury are both correct.总的来说,@李翔和@Okorimi Manoury 的想法都是正确的。 For the code segment of @Okorimi Manoury, it is from the following link: Textual attention visualization .对于@Okorimi Manoury 的代码段,来自以下链接: Textual attention visualization

Now, let me explain the procedure step by step:现在,让我逐步解释该过程:

(1). (1). You should have a well-trained model, you need to load the model and extract the attention layer's weights.你应该有一个训练有素的模型,你需要加载模型并提取注意力层的权重。 To extract certain layer weights, you can use model.summary() to check the model architecture.要提取某些层权重,您可以使用model.summary()来检查模型架构。 Then, you can use:然后,您可以使用:

layer_weights = model.layers[3].get_weights() #suppose your attention layer is the third layer

layer_weights is a list, for example, for word-level attention of HAN attention , the list of layer_weights has three element: W, b, and u. layer_weights是一个列表,例如对于HAN attention的word-level attention, layer_weights的列表有三个元素:W、b、u。 In other words, layer_weights[0] = W, layer_weights[1] = b, and layer_weights[2] = u .换句话说, layer_weights[0] = W, layer_weights[1] = b, and layer_weights[2] = u

(2). (2). You also need get the layer output before the attention layer.您还需要在注意力层之前获得层输出。 In this example, we need to get second layer output.在这个例子中,我们需要得到第二层输出。 You can use the following codes to do:您可以使用以下代码执行以下操作:

new_model = Model(inputs=model.input, outputs=model.layers[2].output) output_before_att = new_model.predict(x_test_sample) #extract layer output

(3). (3). Check the details: suppose you input is a text segment with 100 words and 300 dimension (input is [100, 300]), and after the second layer, the dimension is 128. Then, the shape of output_before_att is [100, 128].查看详情:假设你输入的是一个100字300维度的文本段(输入是[100, 300]),第二层之后维度是128,那么output_before_att的shape就是[100, 128] . Correspondingly, layer_weights[0] (W) is [128, 128], layer_weights[1] (b) is [1, 128], layer_weights[2] (u) is [1,128].相应地, layer_weights[0] (W)为[128, 128], layer_weights[1] (b)为[1, 128], layer_weights[2] (u)为[1,128]。 Then, we need the following codes:然后,我们需要以下代码:

eij = np.tanh(np.dot(output_before_att, layer_weights[0]) + layer_weights[1]) #Eq.(5) in the paper

eij = np.dot(eij, layer_weights[2]) #Eq.(6)

eij = eij.reshape((eij.shape[0], eij.shape[1])) # reshape the vector

ai = np.exp(eij) #Eq.(6)

weights = ai / np.sum(ai) # Eq.(6)

The weights is a list (100-dimension), each element is the attention weight (importance) for the 100 input words. weights是一个列表(100 维),每个元素是 100 个输入词的注意力权重(重要性)。 After that, you can visualize the attention weights.之后,您可以可视化注意力权重。

Hope my explanation can help you.希望我的解释能帮到你。

You can use the get_weights() method of your custom layer to get a list of all weights.您可以使用自定义层的get_weights()方法来获取所有权重的列表。 You can find more info here .您可以在此处找到更多信息。

You need to make these modifications to your code during the model creation:您需要在模型创建期间对代码进行以下修改:

model1.add(TimeDistributed(Dense(200)))
atn = AttentionWithContext()
model1.add(atn)

and then, after training, just use:然后,训练后,只需使用:

atn.get_weights()[index]

the extract the weight matrix W as a numpy array (I think index should be set to 0 , but you have to try this out in your own).将权重矩阵W提取为一个numpy数组(我认为index应该设置为0 ,但您必须自己尝试一下)。 Then you can use pyplot 's imshow / matshow method to display the matrix.然后你可以使用pyplotimshow / matshow方法来显示矩阵。

Thank you for your edit.感谢您的编辑。 Your solution return the weights of attention layers but I'm looking for the word weights.您的解决方案返回注意力层的权重,但我正在寻找权重这个词。

I found other solution for this problem:我为这个问题找到了其他解决方案:

1.define function to compute attention weight: 1.定义函数来计算注意力权重:

def cal_att_weights(output, att_w):
#if model_name == 'HAN':
eij = np.tanh(np.dot(output[0], att_w[0]) + att_w[1])
eij = np.dot(eij, att_w[2])
eij = eij.reshape((eij.shape[0], eij.shape[1]))
ai = np.exp(eij)
weights = ai / np.sum(ai)
return weights
from keras import backend as K
sent_before_att = K.function([model1.layers[0].input,K.learning_phase()],  [model1.layers[2].output])
sent_att_w = model1.layers[5].get_weights()
test_seq=np.array(userinp)
test_seq=np.array(test_seq).reshape(1,118,100)
out = sent_before_att([test_seq, 0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM