Keras: How to display attention weights in LSTM model

Question

I made a text classification model using an LSTM with attention layer. I did my model well, it works well, but I can't display the attention weights and the importance/attention of each word in a review (the input text). The code used for this model is:

def dot_product(x, kernel):
   if K.backend() == 'tensorflow':
       return K.squeeze(K.dot(x, K.expand_dims(kernel)),axis=-1)
   else:
       return K.dot(x, kernel)

class AttentionWithContext(Layer):
    """
Attention operation, with a context/query vector, for temporal data.

"Hierarchical Attention Networks for Document Classification"
by using a context vector to assist the attention
# Input shape
    3D tensor with shape: (samples, steps, features).
# Output shape
    2D tensor with shape: (samples, features).
How to use:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Note: The layer has been tested with Keras 2.0.6
Example:
    model.add(LSTM(64, return_sequences=True))
    model.add(AttentionWithContext())
    # next add a Dense layer (for classification/regression) or whatever
     """

def __init__(self,
             W_regularizer=None, u_regularizer=None, b_regularizer=None,
             W_constraint=None, u_constraint=None, b_constraint=None,
             bias=True, **kwargs):

    self.supports_masking = True
    self.init = initializers.get('glorot_uniform')

    self.W_regularizer = regularizers.get(W_regularizer)
    self.u_regularizer = regularizers.get(u_regularizer)
    self.b_regularizer = regularizers.get(b_regularizer)

    self.W_constraint = constraints.get(W_constraint)
    self.u_constraint = constraints.get(u_constraint)
    self.b_constraint = constraints.get(b_constraint)

    self.bias = bias
    super(AttentionWithContext, self).__init__(**kwargs)

def build(self, input_shape):
    assert len(input_shape) == 3

    self.W = self.add_weight((input_shape[-1], input_shape[-1],),
                             initializer=self.init,
                             name='{}_W'.format(self.name),
                             regularizer=self.W_regularizer,
                             constraint=self.W_constraint)
    if self.bias:
        self.b = self.add_weight((input_shape[-1],),
                                 initializer='zero',
                                 name='{}_b'.format(self.name),
                                 regularizer=self.b_regularizer,
                                 constraint=self.b_constraint)

    self.u = self.add_weight((input_shape[-1],),
                             initializer=self.init,
                             name='{}_u'.format(self.name),
                             regularizer=self.u_regularizer,
                             constraint=self.u_constraint)

    super(AttentionWithContext, self).build(input_shape)

def compute_mask(self, input, input_mask=None):
    # do not pass the mask to the next layers
    return None

def call(self, x, mask=None):
    uit = dot_product(x, self.W)

    if self.bias:
        uit += self.b

    uit = K.tanh(uit)
    ait = dot_product(uit, self.u)

    a = K.exp(ait)

    # apply mask after the exp. will be re-normalized next
    if mask is not None:
        # Cast the mask to floatX to avoid float64 upcasting in theano
        a *= K.cast(mask, K.floatx())

    # in some cases especially in the early stages of training the sum may be almost zero
    # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
    # a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
    a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

    a = K.expand_dims(a)
    weighted_input = x * a
    return K.sum(weighted_input, axis=1)

def compute_output_shape(self, input_shape):
    return input_shape[0], input_shape[-1]


EMBEDDING_DIM=100
max_seq_len=118
bach_size = 256
num_epochs=50
from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, Dropout


def BidLstm():
    #inp = Input(shape=(118,100))
    #x = Embedding(max_features, embed_size, weights=[embedding_matrix],
              #trainable=False)(inp)
     model1=Sequential()
     model1.add(Dense(512,input_shape=(118,100)))
    model1.add(Activation('relu'))
    #model1.add(Flatten()) 
    #model1.add(BatchNormalization(input_shape=(100,)))
    model1.add(Bidirectional(LSTM(100, activation="relu",return_sequences=True)))
    model1.add(Dropout(0.1))
    model1.add(TimeDistributed(Dense(200)))
    model1.add(AttentionWithContext())
    model1.add(Dropout(0.25))
    model1.add(Dense(4, activation="softmax"))
    model1.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
              metrics=['accuracy'])
    model1.summary()
    return model1

Answer 1

Please see the github repository here: https://github.com/FlorisHoogenboom/keras-han-for-docla

first define the weights computation in attention layer explicitly second to extract the previous layers output and attention layer weights and then multiply it as the word attentive weights

Answer 2

After reading the above comprehensive answers, I finally understand how to extract the weights for attention layers. Overall, the ideas of @Li Xiang and @Okorimi Manoury are both correct. For the code segment of @Okorimi Manoury, it is from the following link: Textual attention visualization .

Now, let me explain the procedure step by step:

(1). You should have a well-trained model, you need to load the model and extract the attention layer's weights. To extract certain layer weights, you can use model.summary() to check the model architecture. Then, you can use:

layer_weights = model.layers[3].get_weights() #suppose your attention layer is the third layer

layer_weights is a list, for example, for word-level attention of HAN attention , the list of layer_weights has three element: W, b, and u. In other words, layer_weights[0] = W, layer_weights[1] = b, and layer_weights[2] = u .

(2). You also need get the layer output before the attention layer. In this example, we need to get second layer output. You can use the following codes to do:

new_model = Model(inputs=model.input, outputs=model.layers[2].output) output_before_att = new_model.predict(x_test_sample) #extract layer output

(3). Check the details: suppose you input is a text segment with 100 words and 300 dimension (input is [100, 300]), and after the second layer, the dimension is 128. Then, the shape of output_before_att is [100, 128]. Correspondingly, layer_weights[0] (W) is [128, 128], layer_weights[1] (b) is [1, 128], layer_weights[2] (u) is [1,128]. Then, we need the following codes:

eij = np.tanh(np.dot(output_before_att, layer_weights[0]) + layer_weights[1]) #Eq.(5) in the paper

eij = np.dot(eij, layer_weights[2]) #Eq.(6)

eij = eij.reshape((eij.shape[0], eij.shape[1])) # reshape the vector

ai = np.exp(eij) #Eq.(6)

weights = ai / np.sum(ai) # Eq.(6)

The weights is a list (100-dimension), each element is the attention weight (importance) for the 100 input words. After that, you can visualize the attention weights.

Hope my explanation can help you.

Answer 3

You can use the get_weights() method of your custom layer to get a list of all weights. You can find more info here .

You need to make these modifications to your code during the model creation:

model1.add(TimeDistributed(Dense(200)))
atn = AttentionWithContext()
model1.add(atn)

and then, after training, just use:

atn.get_weights()[index]

the extract the weight matrix W as a numpy array (I think index should be set to 0 , but you have to try this out in your own). Then you can use pyplot 's imshow / matshow method to display the matrix.

Answer 4

Thank you for your edit. Your solution return the weights of attention layers but I'm looking for the word weights.

I found other solution for this problem:

1.define function to compute attention weight:

def cal_att_weights(output, att_w):
#if model_name == 'HAN':
eij = np.tanh(np.dot(output[0], att_w[0]) + att_w[1])
eij = np.dot(eij, att_w[2])
eij = eij.reshape((eij.shape[0], eij.shape[1]))
ai = np.exp(eij)
weights = ai / np.sum(ai)
return weights
from keras import backend as K
sent_before_att = K.function([model1.layers[0].input,K.learning_phase()],  [model1.layers[2].output])
sent_att_w = model1.layers[5].get_weights()
test_seq=np.array(userinp)
test_seq=np.array(test_seq).reshape(1,118,100)
out = sent_before_att([test_seq, 0])

Keras: How to display attention weights in LSTM model

Question

4 answers

solution1
2 2019-07-05 08:40:16

solution2
2 2020-05-16 02:34:20

solution3
1 2018-09-04 02:54:04

solution4
-1 ACCPTED 2018-09-04 12:35:45

Keras: How to display attention weights in LSTM model

Question

4 answers

solution1 2 2019-07-05 08:40:16

solution2 2 2020-05-16 02:34:20

solution3 1 2018-09-04 02:54:04

solution4 -1 ACCPTED 2018-09-04 12:35:45

solution1
2 2019-07-05 08:40:16

solution2
2 2020-05-16 02:34:20

solution3
1 2018-09-04 02:54:04

solution4
-1 ACCPTED 2018-09-04 12:35:45