注意层如何在喀拉拉邦实现？

Question

我正在学习注意力模型及其在keras中的实现。 在搜索时，我首先遇到了这两种方法，第二种方法可用来在keras中创建一个关注层

# First method

class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, features, hidden):
        hidden_with_time_axis = tf.expand_dims(hidden, 1)
        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
        attention_weights = tf.nn.softmax(self.V(score), axis=1)
        context_vector = attention_weights * features
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector, attention_weights

# Second method

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)

sent_representation = merge([activations, attention], mode='mul')

注意模型背后的数学是

如果我们看一下第一种方法，它在某种程度上是注意力数学的直接实现，而第二种方法在互联网上的点击率更高。

我真正的怀疑是第二种方法在这些方面

attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)
sent_representation = merge([activations, attention], mode='mul')

哪个是值得关注的正确实施方式？
第二种方法中的RepeatVector和Permute层背后的RepeatVector是什么？
在第一种方法中， W1 ， W2是权重； 为什么在这里将稠密的层视为权重？
为什么将V值视为单个单位致密层？
V(score)做什么的？

Answer 1

哪个是值得关注的正确实施方式？

我建议以下内容：

https://github.com/tensorflow/models/blob/master/official/transformer/model/attention_layer.py#L24

上面的多标题Attention层实现了一个巧妙的技巧：调整矩阵的形状，以便将其成形为（batch_size，heads，time_steps，features / heads）而不是成形为（batch_size，heads，time_steps，features / heads），然后执行对“功能/标题”块的计算。

第二种方法中的RepeatVector和Permute层背后的直觉是什么？

您的代码不完整...代码中缺少矩阵乘法（您没有显示正在使用的Attention层）。 这可能会修改结果的形状，并且此代码试图以某种方式恢复正确的形状。 这可能不是最好的方法。

在第一种方法中，W1，W2是权重； 为什么在这里将稠密的层视为权重？

密集层是一组权重...您的问题有点含糊。

为什么将V值视为单个单位致密层？

这是一个非常奇怪的选择，与我对本文的阅读或我所看到的实现都不相符。

注意层如何在喀拉拉邦实现？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-07-11 12:15:55

注意层如何在喀拉拉邦实现？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-07-11 12:15:55

解决方案1
2 已采纳 2019-07-11 12:15:55