如何在 keras 中添加注意力機制？

Question

我目前正在使用從github上的一次討論中獲得的這段代碼，這是注意力機制的代碼：

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=vocab_size,
        output_dim=embedding_size,
        input_length=max_length,
        trainable=False,
        mask_zero=False
    )(_input)

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

probabilities = Dense(3, activation='softmax')(sent_representation)

這是正確的方法嗎？ 我有點期待時間分布層的存在，因為注意力機制分布在 RNN 的每個時間步長中。 我需要有人來確認這個實現（代碼）是注意力機制的正確實現。 謝謝你。

Answer 1

如果您想關注時間維度，那么這部分代碼對我來說似乎是正確的：

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)

sent_representation = merge([activations, attention], mode='mul')

您已經計算出形狀(batch_size, max_length)的注意力向量：

attention = Activation('softmax')(attention)

我以前從未見過這個代碼，所以我不能說這個代碼是否真的正確：

K.sum(xin, axis=-2)

進一步閱讀（你可以看看）：

Answer 2

注意力機制關注句子的不同部分：

activations = LSTM(units, return_sequences=True)(embedded)

它通過以下方式確定該句子的每個隱藏狀態的貢獻

計算每個隱藏狀態的聚合attention = Dense(1, activation='tanh')(activations)
為不同狀態分配權重attention = Activation('softmax')(attention)

最后注意不同的狀態：

sent_representation = merge([activations, attention], mode='mul')

這部分我不太明白： sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

想了解更多可以參考this和this ，this one也給出了一個很好的實現，看你自己能不能多了解一些。

Answer 3

最近我正在研究在密集層上應用注意力機制，這是一個示例實現：

def build_model():
  input_dims = train_data_X.shape[1]
  inputs = Input(shape=(input_dims,))
  dense1800 = Dense(1800, activation='relu', kernel_regularizer=regularizers.l2(0.01))(inputs)
  attention_probs = Dense( 1800, activation='sigmoid', name='attention_probs')(dense1800)
  attention_mul = multiply([ dense1800, attention_probs], name='attention_mul')
  dense7 = Dense(7, kernel_regularizer=regularizers.l2(0.01), activation='softmax')(attention_mul)   
  model = Model(input=[inputs], output=dense7)
  model.compile(optimizer='adam',
                loss='categorical_crossentropy',
                metrics=['accuracy'])
  return model

print (model.summary)

model.fit( train_data_X, train_data_Y_, epochs=20, validation_split=0.2, batch_size=600, shuffle=True, verbose=1)

Answer 4

我覺得你可以試試下面的代碼，用 LSTM 網絡添加 keras 自注意力機制

    from keras_self_attention import SeqSelfAttention

    inputs = Input(shape=(length,))
    embedding = Embedding(vocab_size, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=False)(inputs)
    lstm = LSTM(num_lstm, input_shape=(X[train].shape[0], X[train].shape[1]), return_sequences=True)(embedding)
    attn = SeqSelfAttention(attention_activation='sigmoid')(lstm)
    Flat = Flatten()(attn)
    dense = Dense(32, activation='relu')(Flat)
    outputs = Dense(3, activation='sigmoid')(dense)
    model = Model(inputs=[inputs], outputs=outputs)
    model.compile(loss='binary_crossentropy', optimizer=Adam(0.001), metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=10, batch_size=32,  validation_data=(X_val,y_val), shuffle=True)

Answer 5

雖然提供了許多不錯的選擇，但我已嘗試修改您共享的代碼以使其工作。 我還回答了您目前尚未解決的其他問題：

一季度。 這是正確的方法嗎？ 注意層本身看起來不錯。 無需更改。 您使用注意力層輸出的方式可以稍微簡化和修改，以包含一些最近的框架升級。

    sent_representation = merge.Multiply()([activations, attention])
    sent_representation = Lambda(lambda xin: K.sum(xin, axis=1))(sent_representation)

你現在可以走了！

Q2。 我有點期待時間分布層的存在，因為注意力機制分布在 RNN 的每個時間步

不，您不需要時間分布層，否則權重將跨時間步共享，這不是您想要的。

其他具體細節可以參考： https : //towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

如何在 keras 中添加注意力機制？

問題描述

5 個解決方案

解決方案1
18 已采納 2017-06-06 10:28:12

解決方案2
2 2018-05-15 03:56:03

解決方案3
2 2019-07-03 22:49:52

解決方案4
0 2020-07-21 08:51:52

解決方案5
0 2020-12-09 14:25:59

如何在 keras 中添加注意力機制？

問題描述

5 個解決方案

解決方案1 18 已采納 2017-06-06 10:28:12

解決方案2 2 2018-05-15 03:56:03

解決方案3 2 2019-07-03 22:49:52

解決方案4 0 2020-07-21 08:51:52

解決方案5 0 2020-12-09 14:25:59

解決方案1
18 已采納 2017-06-06 10:28:12

解決方案2
2 2018-05-15 03:56:03

解決方案3
2 2019-07-03 22:49:52

解決方案4
0 2020-07-21 08:51:52

解決方案5
0 2020-12-09 14:25:59