简体   繁体   English

文本分类 CNN 过拟合训练

[英]Text classification CNN overfits training

I am trying to use a CNN architecture to classify text sentences.我正在尝试使用 CNN 架构对文本句子进行分类。 The architecture of the network is as follows:网络架构如下:

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
pool1 = MaxPooling1D(pool_size=2)(drop21)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
dense = Dense(16, activation='relu')(pool2)

flat = Flatten()(dense)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I have some callbacks as early_stopping and reduceLR to stop the training and to reduce the learning rate when the validation loss is not improving (reducing).当验证损失没有改善(减少)时,我有一些回调作为 early_stopping 和 reduceLR 来停止训练并降低学习率。

early_stopping = EarlyStopping(monitor='val_loss', 
                               patience=5)
model_checkpoint = ModelCheckpoint(filepath=checkpoint_filepath,
                                   save_weights_only=False,
                                   monitor='val_loss',
                                   mode="auto",
                                   save_best_only=True)
learning_rate_decay = ReduceLROnPlateau(monitor='val_loss', 
                                        factor=0.1, 
                                        patience=2, 
                                        verbose=1, 
                                        mode='auto',
                                        min_delta=0.0001, 
                                        cooldown=0,
                                        min_lr=0)

Once the model is trained the history of the training goes as follows:训练 model 后,训练的历史如下: 在此处输入图像描述

We can observe here that the validation loss is not improving from epoch 5 on and that the training loss is being overfitted with each step.我们可以在这里观察到,从 epoch 5 开始,验证损失并没有改善,并且每一步都过度拟合了训练损失。

I will like to know if I'm doing something wrong in the architecture of the CNN?我想知道我在 CNN 的架构中是否做错了什么? Aren't enough the dropout layers to avoid the overfitting?辍学层还不足以避免过度拟合吗? Which are other ways to reduce overfitting?还有哪些减少过拟合的方法?

Any suggestion?有什么建议吗?

Thanks in advance.提前致谢。


Edit:编辑:

I have tried also with regularization an the result where even worse:我也尝试过正则化,结果更糟:

kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)

在此处输入图像描述


Edit 2:编辑2:

I have tried to apply BatchNormalization layers after each convolution and the result is the next one:我尝试在每次卷积后应用 BatchNormalization 层,结果是下一个:

norm = BatchNormalization()(conv2)

在此处输入图像描述


Edit 3:编辑3:

After applying the LSTM architecture:应用 LSTM 架构后:

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)

lstm1 = Bidirectional(LSTM(128, return_sequences = True))(drop22)
lstm2 = Bidirectional(LSTM(64, return_sequences = True))(lstm1)

flat = Flatten()(lstm2)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

在此处输入图像描述

overfitting can caused by many factors, it happens when your model fits too well to the training set.过度拟合可能由许多因素引起,当您的 model 太适合训练集时会发生这种情况。

To handle it you can do some ways:要处理它,您可以采取一些方法:

  1. Add more data添加更多数据
  2. Use data augmentation使用数据增强
  3. Use architectures that generalize well使用泛化良好的架构
  4. Add regularization (mostly dropout, L1/L2 regularization are also possible)添加正则化(主要是dropout,L1/L2正则化也是可以的)
  5. Reduce architecture complexity.降低架构复杂性。

for more clearly you can read in https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d为了更清楚,您可以阅读https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d

This is screaming Transfer Learning .这是尖叫的迁移学习 google-unversal-sentence-encoder is perfect for this use case. google-unversal-sentence-encoder非常适合这个用例。 Replace your model with将您的 model 替换为

import tensorflow_hub as hub 
import tensorflow_text

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

# this next layer might need some tweaking dimension wise, to correctly fit
# X_train in the model
text_input = tf.keras.layers.Lambda(lambda x: tf.squeeze(x))(text_input)
# conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
# drop21 = Dropout(0.5)(conv2)
# pool1 = MaxPooling1D(pool_size=2)(drop21)
# conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
# drop22 = Dropout(0.5)(conv22)
# pool2 = MaxPooling1D(pool_size=2)(drop22)

# 1) you might need `text_input = tf.expand_dims(text_input, axis=0)` here
# 2) If you're classifying English only, you can use the link to the normal `google-universal-sentence-encoder`, not the multilingual one
# 3) both the English and multilingual have a `-large` version. More accurate but slower to train and infer. 
embedded = hub.KerasLayer('https://tfhub.dev/google/universal-sentence-encoder-multilingual/3')(text_input) 

# this layer seems out of place, 
# dense = Dense(16, activation='relu')(embedded) 

# you don't need to flatten after a dense layer (in your case) or a backbone (in my case (google-universal-sentence-encoder))
# flat = Flatten()(dense)

dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)

I think since you are doing a text Classification, adding 1 or 2 LSTM layers might help the network learn better, since it will be able to better associate with the context of the data.我认为,由于您正在进行文本分类,因此添加 1 或 2 个 LSTM 层可能有助于网络更好地学习,因为它将能够更好地与数据的上下文相关联。 I suggest adding the following code before the flatten layer.我建议在 flatten 层之前添加以下代码。

lstm1 = Bidirectional(LSTM(128, return_sequence = True))
lstm2 = Bidirectional(LSTM(64))

LSTM layers can help neural network learn association between certain words and might improve the accuracy of your network. LSTM 层可以帮助神经网络学习某些单词之间的关联,并可能提高网络的准确性。

I also Suggest dropping the Max Pooling layers as max pooling especially in text classification can lead the network to drop some of the useful features.我还建议删除 Max Pooling 层,因为最大池化尤其是在文本分类中会导致网络丢弃一些有用的功能。 Just keep the convolutional Layers and the dropout.只保留卷积层和 dropout。 Also remove the Dense layer before flatten and add the aforementioned LSTMs.还要在展平之前移除 Dense 层并添加上述 LSTM。

It is unclear how you feed the text into your model.目前尚不清楚如何将文本输入 model。 I am assuming that you tokenize the text to represent it as a sequence of integers, but do you use any word embedding prior to feeding it into your model?我假设您对文本进行标记以将其表示为整数序列,但是在将其输入 model 之前,您是否使用任何词嵌入? If not, I suggest you to throw atrainable tensorflow Embedding layer at the start of your model.如果没有,我建议您在 model 的开头抛出可训练的 tensorflow Embedding层。 There is a clever technique called Embedding Lookup to speed up its training, but you can save it for later.有一种称为 Embedding Lookup 的巧妙技术可以加快其训练速度,但您可以将其保存以备后用。 Try adding this layer to your model.尝试将此层添加到您的 model。 Then your Conv1D layer would have a much easier time working on a sequence of floats.然后,您的Conv1D层将更容易处理一系列浮点数。 Also, I suggest you throw BatchNormalization after each Conv1D , it should help to speed up convergence and training.另外,我建议你在每个Conv1D之后抛出BatchNormalization ,它应该有助于加速收敛和训练。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM