簡體   English   中英

Keras:如何在 LSTM 模型中顯示注意力權重

[英]Keras: How to display attention weights in LSTM model

我使用帶有注意力層的 LSTM 制作了一個文本分類模型。 我的模型做得很好,效果很好,但是我無法顯示評論(輸入文本)中每個單詞的注意力權重和重要性/注意力。 該模型使用的代碼是:

def dot_product(x, kernel):
   if K.backend() == 'tensorflow':
       return K.squeeze(K.dot(x, K.expand_dims(kernel)),axis=-1)
   else:
       return K.dot(x, kernel)

class AttentionWithContext(Layer):
    """
Attention operation, with a context/query vector, for temporal data.

"Hierarchical Attention Networks for Document Classification"
by using a context vector to assist the attention
# Input shape
    3D tensor with shape: (samples, steps, features).
# Output shape
    2D tensor with shape: (samples, features).
How to use:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Note: The layer has been tested with Keras 2.0.6
Example:
    model.add(LSTM(64, return_sequences=True))
    model.add(AttentionWithContext())
    # next add a Dense layer (for classification/regression) or whatever
     """

def __init__(self,
             W_regularizer=None, u_regularizer=None, b_regularizer=None,
             W_constraint=None, u_constraint=None, b_constraint=None,
             bias=True, **kwargs):

    self.supports_masking = True
    self.init = initializers.get('glorot_uniform')

    self.W_regularizer = regularizers.get(W_regularizer)
    self.u_regularizer = regularizers.get(u_regularizer)
    self.b_regularizer = regularizers.get(b_regularizer)

    self.W_constraint = constraints.get(W_constraint)
    self.u_constraint = constraints.get(u_constraint)
    self.b_constraint = constraints.get(b_constraint)

    self.bias = bias
    super(AttentionWithContext, self).__init__(**kwargs)

def build(self, input_shape):
    assert len(input_shape) == 3

    self.W = self.add_weight((input_shape[-1], input_shape[-1],),
                             initializer=self.init,
                             name='{}_W'.format(self.name),
                             regularizer=self.W_regularizer,
                             constraint=self.W_constraint)
    if self.bias:
        self.b = self.add_weight((input_shape[-1],),
                                 initializer='zero',
                                 name='{}_b'.format(self.name),
                                 regularizer=self.b_regularizer,
                                 constraint=self.b_constraint)

    self.u = self.add_weight((input_shape[-1],),
                             initializer=self.init,
                             name='{}_u'.format(self.name),
                             regularizer=self.u_regularizer,
                             constraint=self.u_constraint)

    super(AttentionWithContext, self).build(input_shape)

def compute_mask(self, input, input_mask=None):
    # do not pass the mask to the next layers
    return None

def call(self, x, mask=None):
    uit = dot_product(x, self.W)

    if self.bias:
        uit += self.b

    uit = K.tanh(uit)
    ait = dot_product(uit, self.u)

    a = K.exp(ait)

    # apply mask after the exp. will be re-normalized next
    if mask is not None:
        # Cast the mask to floatX to avoid float64 upcasting in theano
        a *= K.cast(mask, K.floatx())

    # in some cases especially in the early stages of training the sum may be almost zero
    # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
    # a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
    a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

    a = K.expand_dims(a)
    weighted_input = x * a
    return K.sum(weighted_input, axis=1)

def compute_output_shape(self, input_shape):
    return input_shape[0], input_shape[-1]


EMBEDDING_DIM=100
max_seq_len=118
bach_size = 256
num_epochs=50
from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, Dropout


def BidLstm():
    #inp = Input(shape=(118,100))
    #x = Embedding(max_features, embed_size, weights=[embedding_matrix],
              #trainable=False)(inp)
     model1=Sequential()
     model1.add(Dense(512,input_shape=(118,100)))
    model1.add(Activation('relu'))
    #model1.add(Flatten()) 
    #model1.add(BatchNormalization(input_shape=(100,)))
    model1.add(Bidirectional(LSTM(100, activation="relu",return_sequences=True)))
    model1.add(Dropout(0.1))
    model1.add(TimeDistributed(Dense(200)))
    model1.add(AttentionWithContext())
    model1.add(Dropout(0.25))
    model1.add(Dense(4, activation="softmax"))
    model1.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
              metrics=['accuracy'])
    model1.summary()
    return model1

請在此處查看 github 存儲庫: https : //github.com/FlorisHoogenboom/keras-han-for-docla

首先明確定義注意力層中的權重計算,然后提取前一層輸出和注意力層權重,然后將其乘以注意力權重這個詞

看完以上綜合答案,我終於明白了如何提取attention layer的權重。 總的來說,@李翔和@Okorimi Manoury 的想法都是正確的。 對於@Okorimi Manoury 的代碼段,來自以下鏈接: Textual attention visualization

現在,讓我逐步解釋該過程:

(1). 你應該有一個訓練有素的模型,你需要加載模型並提取注意力層的權重。 要提取某些層權重,您可以使用model.summary()來檢查模型架構。 然后,您可以使用:

layer_weights = model.layers[3].get_weights() #suppose your attention layer is the third layer

layer_weights是一個列表,例如對於HAN attention的word-level attention, layer_weights的列表有三個元素:W、b、u。 換句話說, layer_weights[0] = W, layer_weights[1] = b, and layer_weights[2] = u

(2). 您還需要在注意力層之前獲得層輸出。 在這個例子中,我們需要得到第二層輸出。 您可以使用以下代碼執行以下操作:

new_model = Model(inputs=model.input, outputs=model.layers[2].output) output_before_att = new_model.predict(x_test_sample) #extract layer output

(3). 查看詳情:假設你輸入的是一個100字300維度的文本段(輸入是[100, 300]),第二層之后維度是128,那么output_before_att的shape就是[100, 128] . 相應地, layer_weights[0] (W)為[128, 128], layer_weights[1] (b)為[1, 128], layer_weights[2] (u)為[1,128]。 然后,我們需要以下代碼:

eij = np.tanh(np.dot(output_before_att, layer_weights[0]) + layer_weights[1]) #Eq.(5) in the paper

eij = np.dot(eij, layer_weights[2]) #Eq.(6)

eij = eij.reshape((eij.shape[0], eij.shape[1])) # reshape the vector

ai = np.exp(eij) #Eq.(6)

weights = ai / np.sum(ai) # Eq.(6)

weights是一個列表(100 維),每個元素是 100 個輸入詞的注意力權重(重要性)。 之后,您可以可視化注意力權重。

希望我的解釋能幫到你。

您可以使用自定義層的get_weights()方法來獲取所有權重的列表。 您可以在此處找到更多信息。

您需要在模型創建期間對代碼進行以下修改:

model1.add(TimeDistributed(Dense(200)))
atn = AttentionWithContext()
model1.add(atn)

然后,訓練后,只需使用:

atn.get_weights()[index]

將權重矩陣W提取為一個numpy數組(我認為index應該設置為0 ,但您必須自己嘗試一下)。 然后你可以使用pyplotimshow / matshow方法來顯示矩陣。

感謝您的編輯。 您的解決方案返回注意力層的權重,但我正在尋找權重這個詞。

我為這個問題找到了其他解決方案:

1.定義函數來計算注意力權重:

def cal_att_weights(output, att_w):
#if model_name == 'HAN':
eij = np.tanh(np.dot(output[0], att_w[0]) + att_w[1])
eij = np.dot(eij, att_w[2])
eij = eij.reshape((eij.shape[0], eij.shape[1]))
ai = np.exp(eij)
weights = ai / np.sum(ai)
return weights
from keras import backend as K
sent_before_att = K.function([model1.layers[0].input,K.learning_phase()],  [model1.layers[2].output])
sent_att_w = model1.layers[5].get_weights()
test_seq=np.array(userinp)
test_seq=np.array(test_seq).reshape(1,118,100)
out = sent_before_att([test_seq, 0])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM