[英]Mismatch between computational complexity of Additive attention and RNN cell
According to
Attention is all you need
paper: Additive attention (The classic attention use in RNN by Bahdanau) computes the compatibility function using a feed-forward.network with a single hidden layer.根据
Attention is all you need
论文:Additive attention(Bahdanau 在 RNN 中使用的经典注意力)使用具有单个隐藏层的前馈网络计算兼容性 function。 While the two are similar in theoretical complexity , ...虽然两者在理论复杂性上相似,...
Indeed, we can see here that the computational complexity of additive attention and dot-prod (transformer attention) are both n²*d
.事实上,我们可以在这里看到加法注意力和 dot-prod(transformer 注意力)的计算复杂度都是
n²*d
。
However, if we look closer at additive attention, it is in fact a RNN cell which have a computational complexity of n*d²
(according to the same table).然而,如果我们仔细观察附加注意力,它实际上是一个具有
n*d²
计算复杂度的 RNN 单元(根据同一张表)。
Thus, shouldn't the computational complexity of additive attention be n*d²
instead of n²*d
?因此,加法注意力的计算复杂度不应该是
n*d²
而不是n²*d
吗?
Your claim that additive attention is in fact a RNN cell is what is leading you astray.你声称加性注意力实际上是一个 RNN 单元,这让你误入歧途。 Additive attention is implemented using a fully-connected shallow (1 hidden layer) feedforward neural.network "between" the encoder and decoder RNNs as shown below and described in the original paper by Bahdanau et al.
加法注意是在编码器和解码器 RNN“之间”使用完全连接的浅层(1 个隐藏层)前馈神经网络实现的,如下所示,并在 Bahdanau 等人的原始论文中进行了描述。 (pg. 3) [1] :
(第 3 页) [1] :
... an alignment model which scores how well the inputs around position
j
and the output at positioni
match.... alignment model对 position
j
和 output 在 positioni
的输入匹配程度进行评分。 The score is based on the RNN hidden states_i − 1
(just before emittingy_i
, Eq. (4)) and thej
-th annotationh_j
of the input sentence.该分数基于 RNN 隐藏 state
s_i − 1
(就在发出y_i
之前,等式(4))和输入句子的第j
个注释h_j
。We parametrize the alignment model
a
as a feedforward neural.network which is jointly trained with all the other components of the proposed system...我们将 alignment model
a
参数化为前馈神经网络,它与所提出系统的所有其他组件联合训练......
Figure 1: Attention mechanism diagram from [2] .图 1:来自[2]的注意力机制图。
Thus, the alignment scores are calculated by adding the outputs of the decoder hidden state to the encoder outputs.因此,alignment 分是通过将隐藏的解码器 state 的输出添加到编码器输出来计算的。 So the additive attention is not a RNN cell.
所以附加注意力不是 RNN 单元。
[1] Bahdanau, D., Cho, K. and Bengio, Y., 2014. Neural machine translation by jointly learning to align and translate. [1] Bahdanau, D.、Cho, K. 和 Bengio, Y.,2014。通过联合学习对齐和翻译进行神经机器翻译。 arXiv preprint arXiv:1409.0473.
arXiv 预印本 arXiv:1409.0473。
[2] Arbel, N., 2019. Attention in RNNs. [2] Arbel, N., 2019。RNN 中的注意力。 Medium blog post .
中等博客文章。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.