简体   繁体   English

自注意力层的应用引发索引错误

[英]The application of self-attention layer raised index error

So I am doing a classification machine learning with the input of (batch, step, features).所以我正在使用(批次、步骤、特征)的输入进行分类机器学习。

In order to improve the accuracy of this model, I intended to apply a self-attention layer to it.为了提高这个 model 的精度,我打算给它应用一个自注意力层。

I am unfamiliar with how to use it for my case since most examples online are concerned with embedding NLP models.我不熟悉如何在我的案例中使用它,因为大多数在线示例都与嵌入 NLP 模型有关。

def opt_select(optimizer):
    
    if optimizer == 'Adam':
        adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
        return adamopt
    
    elif optimizer == 'RMS':
        
        RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
        return RMSopt
    
    else:
        print('undefined optimizer')

def LSTM_attention_model(X_train, y_train, X_test, y_test, num_classes, loss,batch_size=68, units=128, learning_rate=0.005,epochs=20, dropout=0.2, recurrent_dropout=0.2,optimizer='Adam'):
   
            
    class myCallback(tf.keras.callbacks.Callback):
        def on_epoch_end(self, epoch, logs={}):
            if (logs.get('acc') > 0.90):
                print("\nReached 90% accuracy so cancelling training!")
                self.model.stop_training = True

    callbacks = myCallback()

    model = tf.keras.models.Sequential()
    model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
    model.add(SeqSelfAttention(attention_activation='sigmoid'))
    model.add(Dense(num_classes, activation='softmax'))
    
    opt = opt_select(optimizer)
    
    model.compile(loss=loss,
                  optimizer=opt,
                  metrics=['accuracy'])

    history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_data=(X_test, y_test),
                        verbose=1,
                        callbacks=[callbacks])

    score, acc = model.evaluate(X_test, y_test,
                                batch_size=batch_size)

    yhat = model.predict(X_test)

    return history, that

This led to IndexError: list index out of range这导致了IndexError: list index out of range

What is the correct way to apply this layer to my model?将此层应用于我的 model 的正确方法是什么?


As requested, one may use the following codes to simulate a set of the dataset.根据要求,可以使用以下代码来模拟一组数据集。

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout,Bidirectional,Masking,LSTM
from keras_self_attention import SeqSelfAttention


X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], 700)
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], 100)

batch_size= 217
epochs = 600
dropout = 0.6
Rdropout = 0.7
learning_rate = 0.00001
optimizer = 'RMS'
loss = 'categorical_crossentropy'
num_classes = y_train.shape[1]

LSTM_attention_his,yhat = LSTM_attention_model(X_train,y_train,X_test,y_test,loss =loss,num_classes=num_classes,batch_size=batch_size,units=32,learning_rate=learning_rate,epochs=epochs,dropout = 0.5,recurrent_dropout=Rdropout,optimizer=optimizer)

Here is how I would rewrite the code -这是我将如何重写代码 -

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout, Bidirectional, Masking, LSTM, Reshape
from keras_self_attention import SeqSelfAttention
import numpy as np

def opt_select(optimizer):
    if optimizer == 'Adam':
        adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
        return adamopt

    elif optimizer == 'RMS':

        RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
        return RMSopt

    else:
        print('undefined optimizer')


def LSTM_attention_model(X_train, y_train, X_test, y_test, num_classes, loss, batch_size=68, units=128,
                         learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2, optimizer='Adam'):
    class myCallback(tf.keras.callbacks.Callback):
        def on_epoch_end(self, epoch, logs={}):
            if (logs.get('accuracy') > 0.90):
                print("\nReached 90% accuracy so cancelling training!")
                self.model.stop_training = True

    callbacks = myCallback()

    model = tf.keras.models.Sequential()
    model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout, return_sequences=True)))
    model.add(SeqSelfAttention(attention_activation='sigmoid'))
    model.add(Reshape((-1, model.output.shape[1]*model.output.shape[2])))
    model.add(Dense(num_classes, activation='softmax'))

    opt = opt_select(optimizer)

    model.compile(loss=loss,
                  optimizer=opt,
                  metrics=['accuracy'])

    history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_data=(X_test, y_test),
                        verbose=1,
                        callbacks=[callbacks])

    score, acc = model.evaluate(X_test, y_test,
                                batch_size=batch_size)

    yhat = model.predict(X_test)

    return history, that


X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], (700, 1))
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], (100, 1))

batch_size= 217
epochs = 600
dropout = 0.6
Rdropout = 0.7
learning_rate = 0.00001
optimizer = 'RMS'
loss = 'categorical_crossentropy'
num_classes = y_train.shape[1]

LSTM_attention_his,yhat = LSTM_attention_model(
X_train,y_train,X_test,y_test,
    loss =loss,num_classes=num_classes,batch_size=batch_size,units=32,
    learning_rate=learning_rate,epochs=epochs,dropout = 0.5,recurrent_dropout=Rdropout,optimizer=optimizer
)

These are the changes I had to make to get this to start training -这些是我必须做出的改变才能开始训练 -

  • The original issue was caused by the LSTM layer outputting the wrong dimensions.最初的问题是由 LSTM 层输出错误的维度引起的。 The SeqSelfAttention layer needs a 3D input (one dimension corresponding to the sequence of the data) which was missing from the output of the LSTM layer. SeqSelfAttention层需要一个 3D 输入(对应于数据序列的一维),而 LSTM 层的 output 中缺少该输入。 As mentioned by @today, in the comments, this can be solved by adding return_sequences=True to the LSTM layer.正如@today 所提到的,在评论中,这可以通过将return_sequences=True添加到 LSTM 层来解决。
  • But even with that modification,the code still gives an error at when trying to compute the cost function.The issue is that, the output of the self-attention layer is (None, 50, 64) when this is directly passed into the Dense layer, the final output of the network becomes (None, 50, 1) .但是即使进行了修改,代码在尝试计算成本 function 时仍然会出错。问题是,当直接将其传递到Dense时,自我注意层的 output 为(None, 50, 64)层,网络的最终 output 变为(None, 50, 1) This doesn't make sense for what we are trying to do, because the final output should just contain a single label for each datapoint (it should have the shape (None, 1) ).这对于我们正在尝试做的事情没有意义,因为最终的 output 应该只包含每个数据点的单个 label (它应该具有形状(None, 1) )。 The issue is the output from the self-attention layer which is 3 dimensional (each data point has a (50, 64) feature vector).问题是来自 3 维自注意力层的 output(每个数据点都有一个(50, 64)特征向量)。 This needs to be reshaped into a single dimensional feature vector for the computation to make sense.这需要重新塑造成一维特征向量,以便计算有意义。 So I added a reshape layer model.add(Reshape((-1, ))) between the attention layer and the Dense layer.所以我在注意力层和密集层之间添加了一个重塑层model.add(Reshape((-1, )))
  • In addition, the myCallback class is testing if logs.get('acc') is > 0.9 but I think it should be (logs.get('accuracy') .此外, myCallback class 正在测试logs.get('acc')是否 > 0.9 但我认为应该是(logs.get('accuracy')

To comment on OP's question in the comment on what kind of column should be added, in this case, it was just a matter of extracting the full sequential data from the LSTM layer.要在评论中评论 OP 的关于应该添加哪种列的问题,在这种情况下,只需从 LSTM 层提取完整的序列数据即可。 Without the return_sequence flag, the output from the LSTM layer is (None, 64) This is simply the final features of the LSTM without the intermediate sequential data.没有return_sequence标志,来自 LSTM 层的 output 是(None, 64)这只是没有中间序列数据的 LSTM 的最终特征。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM