Keras 中的 Bi-LSTM 注意力模型

Question

I am trying to make an attention model with Bi-LSTM using word embeddings.我正在尝试使用词嵌入使用 Bi-LSTM 制作注意力模型。 I came across How to add an attention mechanism in keras?我遇到了如何在 keras 中添加注意力机制？ , https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py and https://github.com/keras-team/keras/issues/4962 . ， https://github.com/philipperemy/keras-attention-mechanism/blob/master/attention_lstm.py和https://github.com/keras-team/keras/issues/4962 。

However, I am confused about the implementation of Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification .然而，我Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification的实现感到困惑。 So,所以，

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=30000,
        output_dim=300,
        input_length=100,
        trainable=False,
        mask_zero=False
    )(_input)

activations = Bidirectional(LSTM(20, return_sequences=True))(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)

I am confused here as to which equation is to what from the paper.我在这里很困惑，哪个方程与论文中的哪个方程有关。

attention = Flatten()(attention)
attention = Activation('softmax')(attention)

What will RepeatVector do? RepeatVector 会做什么？

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

Now, again I am not sure why this line is here.现在，我再次不确定为什么这条线在这里。

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

Since I have two classes, I will have the final softmax as:由于我有两个班级，因此最终的 softmax 为：

probabilities = Dense(2, activation='softmax')(sent_representation)

Answer 1

attention = Flatten()(attention)

transform your tensor of attention weights in a vector (of size max_length if your sequence size is max_length).将您的注意力权重张量转换为向量（如果您的序列大小为 max_length，则大小为 max_length）。

attention = Activation('softmax')(attention)

allows having all the attention weights between 0 and 1, the sum of all the weights equal to one.允许所有注意力权重在 0 和 1 之间，所有权重之和等于 1。

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

RepeatVector repeat the attention weights vector (which is of size max_len) with the size of the hidden state (20) in order to multiply the activations and the hidden states element-wise. RepeatVector 将注意力权重向量（大小为 max_len）与隐藏状态的大小 (20) 重复，以便按元素乘以激活和隐藏状态。 The size of the tensor variable activations is max_len*20.张量变量激活的大小为 max_len*20。

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

This Lambda layer sum the weighted hidden states vectors in order to obtain the vector that will be used at the end.这个 Lambda 层将加权的隐藏状态向量相加，以获得将在最后使用的向量。

Hope this helped!希望这有帮助！

Keras 中的 Bi-LSTM 注意力模型

问题描述

1 个解决方案

解决方案1
3 2019-02-03 21:13:07

Keras 中的 Bi-LSTM 注意力模型

问题描述

1 个解决方案

解决方案1 3 2019-02-03 21:13:07

解决方案1
3 2019-02-03 21:13:07