Self-Attention 使用变压器块 keras

Question

Im trying to understand the newly implemented keras transformer class: https://keras.io/examples/nlp/text_classification_with_transformer/我试图了解新实现的keras变压器 class: https://keras.io/examples/nlp/text_classification_with_transformer/

I see text is first embedded and then self-attention is used.我看到首先嵌入文本，然后使用自注意力。 But what if I want to use another embedding than the TokenAndPositionEmbedding - eg in my case I have pre-embedded sentences and like to use self-attention on them.但是，如果我想使用除TokenAndPositionEmbedding之外的其他嵌入——例如，在我的情况下，我有预嵌入的句子并且喜欢对它们使用自我注意。

What I dont understand is what the self.pos_emb does.我不明白的是self.pos_emb的作用。 The class TokenAndPositionEmbedding is returning x and positions , with x being the token_embedding and positions being the number of words to consider? class TokenAndPositionEmbedding正在返回x和positions ，其中x是token_embedding ，而positions是要考虑的字数？ So its basically returning two things?所以它基本上返回了两件事？ I dont understant that..我不明白。。

class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, emded_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=emded_dim)
        self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=emded_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

Or do I just feed my embedded sentences to MultiHeadSelfAttention and put a Dense-Layer after it for classification purpose?还是我只是将嵌入的句子提供给MultiHeadSelfAttention并在其后放置一个 Dense-Layer 用于分类目的？

Answer 1

As you know the transformer is the structure based on nothing but just lots of Dense layers with concepts of residual;如您所知，transformer 是基于大量具有残差概念的Dense层的结构； however, this make the time series data losing its time dependence .然而，这使得时间序列数据失去了时间依赖性。 So for transformer, you need to locate the position , which you can consider as the additional information for this structure so that it won't miss the time dependence.所以对于transformer，你需要找到 position ，你可以把它当作这个结构的附加信息，这样它就不会错过时间依赖性。 If you would like to understand it better by using keras, I will suggest the official tutorial written by Tensorflow: https://www.tensorflow.org/tutorials/text/transformer which details the things you would like to know. If you would like to understand it better by using keras, I will suggest the official tutorial written by Tensorflow: https://www.tensorflow.org/tutorials/text/transformer which details the things you would like to know.

Self-Attention 使用变压器块 keras

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-27 16:53:11

Self-Attention 使用变压器块 keras

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-27 16:53:11

解决方案1
1 已采纳 2020-05-27 16:53:11