如何将注意力层应用于 LSTM 模型

Question

I am doing a speech emotion recognition machine training.我在做语音情感识别机器训练。

I wish to apply an attention layer to the model.我希望将注意力层应用于模型。 The instruction page is hard to understand.说明页面很难理解。

def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
    
    class myCallback(tf.keras.callbacks.Callback):

        def on_epoch_end(self, epoch, logs={}):
            if (logs.get('acc') > 0.95):
                print("\nReached 99% accuracy so cancelling training!")
                self.model.stop_training = True

    callbacks = myCallback()

    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
    model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
    #     model.add(tf.keras.layers.Bidirectional(LSTM(32)))
    model.add(Dense(num_classes, activation='softmax'))

    adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
    RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
    SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)

    model.compile(loss='binary_crossentropy',
                  optimizer=adamopt,
                  metrics=['accuracy'])

    history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_data=(X_test, y_test),
                        verbose=1,
                        callbacks=[callbacks])

    score, acc = model.evaluate(X_test, y_test,
                                batch_size=batch_size)

    yhat = model.predict(X_test)

    return history, yhat

How can I apply it to fit for my model?如何应用它以适合我的模型？

And are use_scale , causal and dropout all the arguments?并use_scale ， causal和dropout所有参数？

If there is a dropout in attention layer , how do we deal with it since we have dropout in LSTM layer?如果有一个dropout的attention layer ，我们怎么处理它，因为我们有dropout的LSTM层？

Answer 1

Attention can be interpreted as a soft vector retrieval.注意可以解释为软向量检索。

You have some query vectors .您有一些查询向量。 For each query, you want to retrieve some对于每个查询，您希望检索一些
values , such that you compute a weighted of them, values ，以便您计算它们的加权，
where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).其中权重是通过将查询与键进行比较来获得的（键的数量必须与值的数量相同，并且它们通常是相同的向量）。

In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.在序列到序列模型中，查询是解码器状态，键和值是解码器状态。

In classification task, you do not have such an explicit query.在分类任务中，您没有这样的显式查询。 The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper ).解决这个问题的最简单方法是训练一个“通用”查询，用于从隐藏状态收集相关信息（类似于本文最初描述的内容）。

If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.如果您将问题视为序列标签，而不是将标签分配给整个序列，而是分配给各个时间步，则您可能希望改用自注意层。

如何将注意力层应用于 LSTM 模型

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-09 09:34:14

如何将注意力层应用于 LSTM 模型

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-09 09:34:14

解决方案1
0 已采纳 2020-11-09 09:34:14