[英]Self-Attention using transformer block keras
Im trying to understand the newly implemented keras
transformer class: https://keras.io/examples/nlp/text_classification_with_transformer/我试图了解新实现的
keras
变压器 class: https://keras.io/examples/nlp/text_classification_with_transformer/
I see text is first embedded and then self-attention is used.我看到首先嵌入文本,然后使用自注意力。 But what if I want to use another embedding than the
TokenAndPositionEmbedding
- eg in my case I have pre-embedded sentences and like to use self-attention on them.但是,如果我想使用除
TokenAndPositionEmbedding
之外的其他嵌入——例如,在我的情况下,我有预嵌入的句子并且喜欢对它们使用自我注意。
What I dont understand is what the self.pos_emb
does.我不明白的是
self.pos_emb
的作用。 The class TokenAndPositionEmbedding
is returning x
and positions
, with x
being the token_embedding
and positions
being the number of words to consider? class
TokenAndPositionEmbedding
正在返回x
和positions
,其中x
是token_embedding
,而positions
是要考虑的字数? So its basically returning two things?所以它基本上返回了两件事? I dont understant that..
我不明白。。
class TokenAndPositionEmbedding(layers.Layer):
def __init__(self, maxlen, vocab_size, emded_dim):
super(TokenAndPositionEmbedding, self).__init__()
self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=emded_dim)
self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=emded_dim)
def call(self, x):
maxlen = tf.shape(x)[-1]
positions = tf.range(start=0, limit=maxlen, delta=1)
positions = self.pos_emb(positions)
x = self.token_emb(x)
return x + positions
Or do I just feed my embedded sentences to MultiHeadSelfAttention
and put a Dense-Layer after it for classification purpose?还是我只是将嵌入的句子提供给
MultiHeadSelfAttention
并在其后放置一个 Dense-Layer 用于分类目的?
As you know the transformer is the structure based on nothing but just lots of Dense
layers with concepts of residual;如您所知,transformer 是基于大量具有残差概念的
Dense
层的结构; however, this make the time series data losing its time dependence .然而,这使得时间序列数据失去了时间依赖性。 So for transformer, you need to locate the position , which you can consider as the additional information for this structure so that it won't miss the time dependence.
所以对于transformer,你需要找到 position ,你可以把它当作这个结构的附加信息,这样它就不会错过时间依赖性。 If you would like to understand it better by using keras, I will suggest the official tutorial written by Tensorflow: https://www.tensorflow.org/tutorials/text/transformer which details the things you would like to know.
If you would like to understand it better by using keras, I will suggest the official tutorial written by Tensorflow: https://www.tensorflow.org/tutorials/text/transformer which details the things you would like to know.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.