简体   繁体   English

用于情感分析的 CNN-LSTM model 验证准确率低

[英]CNN-LSTM model for sentiment analysis has low validation accuracy

I am working on a project to implement CNN-LSTM sentiment analysis.我正在开展一个实施 CNN-LSTM 情感分析的项目。 Below is the code下面是代码

from keras.models import Sequential
from keras import regularizers
from keras import backend as K
from keras.callbacks import ModelCheckpoint
from keras.layers import Dense, Conv1D , MaxPool1D , Flatten , Dropout 
from keras.layers import BatchNormalization
from keras import regularizers
model7 = Sequential()
model7.add(Embedding(max_words, 40,input_length=max_len)) #The embedding layer
model7.add(Conv1D(20, 5, activation='relu', kernel_regularizer = regularizers.l2(l = 0.0001), bias_regularizer=regularizers.l2(0.01)))
model7.add(Dropout(0.5))
model7.add(Bidirectional(LSTM(20,dropout=0.5, kernel_regularizer=regularizers.l2(0.01), recurrent_regularizer=regularizers.l2(0.01), bias_regularizer=regularizers.l2(0.01)))) 
model7.add(Dense(1,activation='sigmoid'))


model7.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])

checkpoint7 = ModelCheckpoint("best_model7.hdf5", monitor='val_accuracy', verbose=1,save_best_only=True, mode='auto', period=1,save_weights_only=False)
history = model7.fit(X_train_padded, y_train, epochs=10,validation_data=(X_test_padded, y_test),callbacks=[checkpoint7])

Even after adding regularizers and dropout, my model has very high validation loss and low accuracy.即使添加了正则化器和 dropout,我的 model 的验证损失也非常高,准确度也很低。 Epoch 3: val_accuracy improved from 0.54517 to 0.57010, saving model to best_model7.hdf5 2188/2188 [==============================] - 290s 132ms/step - loss: 0.4241 - accuracy: 0.8301 - val_loss: 0.9713 - val_accuracy: 0.5701时期 3:val_accuracy 从 0.54517 提高到 0.57010,将 model 保存到 best_model7.hdf5 2188/2188 [============================== ] - 290 秒 132 毫秒/步 - 损失:0.4241 - 准确度:0.8301 - val_loss:0.9713 - val_accuracy:0.5701

My train and test data: train: (70000, 7) test: (30000, 7)我的训练和测试数据: train: (70000, 7) test: (30000, 7)

train['sentiment'].value_counts() 1 41044 0 28956 train['sentiment'].value_counts() 1 41044 0 28956

test['sentiment'].value_counts() 1 17591 0 12409测试['sentiment'].value_counts() 1 17591 0 12409

Can anyone please let me know how to reduce overfitting.谁能告诉我如何减少过度拟合。

Since your code works, I believe that your network is failing silently by 'not learning' a lot from the data.由于您的代码有效,我相信您的网络会因为“没有从数据中学习”很多东西而默默地失败。 Here's a list of some of the things you can generally check:以下是您通常可以检查的一些事项的列表:

  • Is your textual data well transformed into numerical data?您的文本数据是否很好地转换为数字数据? Is it well reprented using TF-IDF or bag of words or any other method that returns a numerical representation?是否使用 TF-IDF 或词袋或任何其他返回数字表示的方法很好地表示?

  • I see that you imported batch normalization but you do not apply it.我看到您导入了批量标准化,但您没有应用它。 Batch norm actually helps and most importantly, does the job of regularizers since each input to each layer is normalized using the mini-batch the network has seen. Batch norm 实际上有帮助,最重要的是,它起到了正则化器的作用,因为每一层的每个输入都是使用网络已经看到的 mini-batch 进行归一化的。 So maybe remove your L2 regularizations in all layers and apply a simple batch norm instead which should reduce overfitting (also, use it without the drop out since some empirical studies show that they should not be combined together)因此,也许可以在所有层中删除您的 L2 正则化并应用简单的批量规范来代替,这应该可以减少过度拟合(另外,使用它时不会丢失,因为一些实证研究表明它们不应该结合在一起)

  • Your embedding output is currently set to 40, that is 40 numerical elements of a text vector that may contain more than 10,000 elements.您的嵌入 output 当前设置为 40,即可能包含超过 10,000 个元素的文本向量的 40 个数字元素。 It seems a bit low.好像有点低。 Try something more 'standard' such as 128 or 256 instead of 40.尝试一些更“标准”的东西,例如 128 或 256 而不是 40。

  • Lastly, you set the adam optimizer with all the default parameters.最后,使用所有默认参数设置 adam 优化器。 However, the learning rate can have a big impact on the way your loss function is computed.然而,学习率会对损失 function 的计算方式产生很大影响。 As I am sure you know, the gradient step uses this learning rate to progress in its calculation of the derivatives for each neuron.我相信你知道,梯度步骤使用这个学习率来计算每个神经元的导数。 the default is learning_rate=0.001 .默认值为learning_rate=0.001 So try the following code and increase a bit the learning rate (for example 0.01 or even 0.1).所以试试下面的代码并提高一点学习率(例如 0.01 甚至 0.1)。

A simple example:一个简单的例子:


# define model
model = Sequential()
model.add(LSTM(32)) # or CNN
model.add(BatchNormalization())
model.add(Dense(1))

# define optimizer
optimizer = keras.optimizers.Adam(0.01)

# define loss function
loss = keras.losses.binary_crossentropy

# define metric to optimize
metric = [keras.metrics.Accuracy(name='accuracy')] # you can add more

# compile model
model.compile(optimizer=optimizer, loss=loss, metrics=metric)

Final thought : I see that you went for a combination of CNN and LSTM which has great merite.最后的想法:我看到你选择了 CNN 和 LSTM 的组合,它有很大的优点。 However, it is always recommended to try a simple MLP network to establish a baseline score that you may later try to beat.但是,始终建议尝试一个简单的 MLP 网络来建立一个您以后可能会尝试击败的基线分数。 Does a simple MLP with 1 or 2 layers and not a lot of units produce a low accuracy score as well?具有 1 层或 2 层且单元不多的简单 MLP 是否也会产生低准确度分数? If it performs better than maybe the problem is in the implementation or in the hyper parameters that you chose for the layers (or even theoretical).如果它的性能比可能更好,那么问题可能出在实现或您为层选择的超参数中(甚至是理论上的)。

I hope this answer helps and cheers!我希望这个答案对您有所帮助和欢呼!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM