简体   繁体   English

Additive attention 和 RNN cell 的计算复杂度不匹配

[英]Mismatch between computational complexity of Additive attention and RNN cell

According to Attention is all you need paper: Additive attention (The classic attention use in RNN by Bahdanau) computes the compatibility function using a feed-forward.network with a single hidden layer.根据Attention is all you need论文:Additive attention(Bahdanau 在 RNN 中使用的经典注意力)使用具有单个隐藏层的前馈网络计算兼容性 function。 While the two are similar in theoretical complexity , ...虽然两者在理论复杂性上相似,...

Indeed, we can see here that the computational complexity of additive attention and dot-prod (transformer attention) are both n²*d .事实上,我们可以在这里看到加法注意力和 dot-prod(transformer 注意力)的计算复杂度都是n²*d

However, if we look closer at additive attention, it is in fact a RNN cell which have a computational complexity of n*d² (according to the same table).然而,如果我们仔细观察附加注意力,它实际上是一个具有n*d²计算复杂度的 RNN 单元(根据同一张表)。

Thus, shouldn't the computational complexity of additive attention be n*d² instead of n²*d ?因此,加法注意力的计算复杂度不应该是n*d²而不是n²*d吗?

Your claim that additive attention is in fact a RNN cell is what is leading you astray.你声称加性注意力实际上是一个 RNN 单元,这让你误入歧途。 Additive attention is implemented using a fully-connected shallow (1 hidden layer) feedforward neural.network "between" the encoder and decoder RNNs as shown below and described in the original paper by Bahdanau et al.加法注意是在编码器和解码器 RNN“之间”使用完全连接的浅层(1 个隐藏层)前馈神经网络实现的,如下所示,并在 Bahdanau 等人的原始论文中进行了描述。 (pg. 3) [1] : (第 3 页) [1]

... an alignment model which scores how well the inputs around position j and the output at position i match. ... alignment model对 position j和 output 在 position i的输入匹配程度进行评分。 The score is based on the RNN hidden state s_i − 1 (just before emitting y_i , Eq. (4)) and the j -th annotation h_j of the input sentence.该分数基于 RNN 隐藏 state s_i − 1 (就在发出y_i之前,等式(4))和输入句子的第j个注释h_j

We parametrize the alignment model a as a feedforward neural.network which is jointly trained with all the other components of the proposed system...我们将 alignment model a参数化为前馈神经网络,它与所提出系统的所有其他组件联合训练......

注意机制图归功于 Nir Arbel

Figure 1: Attention mechanism diagram from [2] .图 1:来自[2]的注意力机制图。

Thus, the alignment scores are calculated by adding the outputs of the decoder hidden state to the encoder outputs.因此,alignment 分是通过隐藏的解码器 state 的输出添加到编码器输出来计算的。 So the additive attention is not a RNN cell.所以附加注意力不是 RNN 单元。

References参考

[1] Bahdanau, D., Cho, K. and Bengio, Y., 2014. Neural machine translation by jointly learning to align and translate. [1] Bahdanau, D.、Cho, K. 和 Bengio, Y.,2014。通过联合学习对齐和翻译进行神经机器翻译。 arXiv preprint arXiv:1409.0473. arXiv 预印本 arXiv:1409.0473。

[2] Arbel, N., 2019. Attention in RNNs. [2] Arbel, N., 2019。RNN 中的注意力。 Medium blog post . 中等博客文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Transformer 中自注意力的计算复杂度 Model - Computational Complexity of Self-Attention in the Transformer Model tf.nn.rnn_cell.MultiRNNCell是否创建可变形状不匹配? - tf.nn.rnn_cell.MultiRNNCell creates variable shape mismatch? 预测SVM分类器的计算复杂度 - Prediction computational complexity of an SVM classifier 如何自定义RNN单元 - How to customize a RNN cell 通过Matlab中的AdaboostM1进行高功能选择,以降低计算复杂性 - High feature selection with AdaboostM1 in Matlab to reduce computational complexity 应该对可变长度序列上的RNN注意权重进行重新标准化,以“掩盖”零填充的影响吗? - Should RNN attention weights over variable length sequences be re-normalized to “mask” the effects of zero-padding? 如何在Tensorflow中使RNN单元的权重无法处理? - How to make the weights of an RNN cell untrainable in Tensorflow? Recurrentshop和Keras:多维RNN导致尺寸不匹配误差 - Recurrentshop and Keras: multi-dimensional RNN results in a dimensions mismatch error 什么是传统的加法模型以及这些模型与机器学习模型之间的区别? - what is traditional additive models and the differences between these models and machine learning models? 在TensorFlow中如何在运行时从RNN单元列表中选择RNN单元 - How to choose a RNN cell from a list of RNN cells during runtime in TensorFlow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM