简体   繁体   English

如何将注意力层应用于 LSTM 模型

[英]How to apply Attention layer to LSTM model

I am doing a speech emotion recognition machine training.我在做语音情感识别机器训练。

I wish to apply an attention layer to the model.我希望将注意力层应用于模型。 The instruction page is hard to understand.说明页面很难理解。

def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
    
    class myCallback(tf.keras.callbacks.Callback):

        def on_epoch_end(self, epoch, logs={}):
            if (logs.get('acc') > 0.95):
                print("\nReached 99% accuracy so cancelling training!")
                self.model.stop_training = True

    callbacks = myCallback()

    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
    model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
    #     model.add(tf.keras.layers.Bidirectional(LSTM(32)))
    model.add(Dense(num_classes, activation='softmax'))

    adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
    RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
    SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)

    model.compile(loss='binary_crossentropy',
                  optimizer=adamopt,
                  metrics=['accuracy'])

    history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_data=(X_test, y_test),
                        verbose=1,
                        callbacks=[callbacks])

    score, acc = model.evaluate(X_test, y_test,
                                batch_size=batch_size)

    yhat = model.predict(X_test)

    return history, yhat

How can I apply it to fit for my model?如何应用它以适合我的模型?

And are use_scale , causal and dropout all the arguments?use_scalecausaldropout所有参数?

If there is a dropout in attention layer , how do we deal with it since we have dropout in LSTM layer?如果有一个dropoutattention layer ,我们怎么处理它,因为我们有dropout的LSTM层?

Attention can be interpreted as a soft vector retrieval.注意可以解释为软向量检索。

  • You have some query vectors .您有一些查询向量 For each query, you want to retrieve some对于每个查询,您希望检索一些

  • values , such that you compute a weighted of them, values ,以便您计算它们的加权,

  • where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).其中权重是通过将查询与进行比较来获得的(的数量必须与值的数量相同,并且它们通常是相同的向量)。

In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.在序列到序列模型中,查询是解码器状态,键和值是解码器状态。

In classification task, you do not have such an explicit query.在分类任务中,您没有这样的显式查询。 The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper ).解决这个问题的最简单方法是训练一个“通用”查询,用于从隐藏状态收集相关信息(类似于本文最初描述的内容)。

If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.如果您将问题视为序列标签,而不是将标签分配给整个序列,而是分配给各个时间步,则您可能希望改用自注意层。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM