简体繁体 English

Additive attention 和 RNN cell 的计算复杂度不匹配

[英]Mismatch between computational complexity of Additive attention and RNN cell

原文 2022-12-02 14:21:33 3 1 machine-learning/ deep-learning/ nlp/ recurrent-neural-network/ attention-model

According to Attention is all you need paper: Additive attention (The classic attention use in RNN by Bahdanau) computes the compatibility function using a feed-forward.network with a single hidden layer.根据Attention is all you need论文：Additive attention（Bahdanau 在 RNN 中使用的经典注意力）使用具有单个隐藏层的前馈网络计算兼容性 function。 While the two are similar in theoretical complexity , ...虽然两者在理论复杂性上相似，...

Indeed, we can see here that the computational complexity of additive attention and dot-prod (transformer attention) are both n²*d .事实上，我们可以在这里看到加法注意力和 dot-prod（transformer 注意力）的计算复杂度都是n²*d 。

However, if we look closer at additive attention, it is in fact a RNN cell which have a computational complexity of n*d² (according to the same table).然而，如果我们仔细观察附加注意力，它实际上是一个具有n*d²计算复杂度的 RNN 单元（根据同一张表）。

Thus, shouldn't the computational complexity of additive attention be n*d² instead of n²*d ?因此，加法注意力的计算复杂度不应该是n*d²而不是n²*d吗？

1 个解决方案

Your claim that additive attention is in fact a RNN cell is what is leading you astray.你声称加性注意力实际上是一个 RNN 单元，这让你误入歧途。 Additive attention is implemented using a fully-connected shallow (1 hidden layer) feedforward neural.network "between" the encoder and decoder RNNs as shown below and described in the original paper by Bahdanau et al.加法注意是在编码器和解码器 RNN“之间”使用完全连接的浅层（1 个隐藏层）前馈神经网络实现的，如下所示，并在 Bahdanau 等人的原始论文中进行了描述。 (pg. 3) [1] : （第 3 页） [1] ：

... an alignment model which scores how well the inputs around position j and the output at position i match. ... alignment model对 position j和 output 在 position i的输入匹配程度进行评分。 The score is based on the RNN hidden state s_i − 1 (just before emitting y_i , Eq. (4)) and the j -th annotation h_j of the input sentence.该分数基于 RNN 隐藏 state s_i − 1 （就在发出y_i之前，等式（4））和输入句子的第j个注释h_j 。

We parametrize the alignment model a as a feedforward neural.network which is jointly trained with all the other components of the proposed system...我们将 alignment model a参数化为前馈神经网络，它与所提出系统的所有其他组件联合训练......

Figure 1: Attention mechanism diagram from [2] .图 1：来自[2]的注意力机制图。

Thus, the alignment scores are calculated by adding the outputs of the decoder hidden state to the encoder outputs.因此，alignment 分是通过将隐藏的解码器 state 的输出添加到编码器输出来计算的。 So the additive attention is not a RNN cell.所以附加注意力不是 RNN 单元。

References参考

[1] Bahdanau, D., Cho, K. and Bengio, Y., 2014. Neural machine translation by jointly learning to align and translate. [1] Bahdanau, D.、Cho, K. 和 Bengio, Y.，2014。通过联合学习对齐和翻译进行神经机器翻译。 arXiv preprint arXiv:1409.0473. arXiv 预印本 arXiv:1409.0473。

[2] Arbel, N., 2019. Attention in RNNs. [2] Arbel, N., 2019。RNN 中的注意力。 Medium blog post . 中等博客文章。