文本分类 CNN 过拟合训练

Question

我正在尝试使用 CNN 架构对文本句子进行分类。 网络架构如下：

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
pool1 = MaxPooling1D(pool_size=2)(drop21)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
dense = Dense(16, activation='relu')(pool2)

flat = Flatten()(dense)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

当验证损失没有改善（减少）时，我有一些回调作为 early_stopping 和 reduceLR 来停止训练并降低学习率。

early_stopping = EarlyStopping(monitor='val_loss', 
                               patience=5)
model_checkpoint = ModelCheckpoint(filepath=checkpoint_filepath,
                                   save_weights_only=False,
                                   monitor='val_loss',
                                   mode="auto",
                                   save_best_only=True)
learning_rate_decay = ReduceLROnPlateau(monitor='val_loss', 
                                        factor=0.1, 
                                        patience=2, 
                                        verbose=1, 
                                        mode='auto',
                                        min_delta=0.0001, 
                                        cooldown=0,
                                        min_lr=0)

训练 model 后，训练的历史如下：

我们可以在这里观察到，从 epoch 5 开始，验证损失并没有改善，并且每一步都过度拟合了训练损失。

我想知道我在 CNN 的架构中是否做错了什么？ 辍学层还不足以避免过度拟合吗？ 还有哪些减少过拟合的方法？

有什么建议吗？

提前致谢。

编辑：

我也尝试过正则化，结果更糟：

kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)

编辑2：

我尝试在每次卷积后应用 BatchNormalization 层，结果是下一个：

norm = BatchNormalization()(conv2)

编辑3：

应用 LSTM 架构后：

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)

lstm1 = Bidirectional(LSTM(128, return_sequences = True))(drop22)
lstm2 = Bidirectional(LSTM(64, return_sequences = True))(lstm1)

flat = Flatten()(lstm2)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Answer 1

过度拟合可能由许多因素引起，当您的 model 太适合训练集时会发生这种情况。

要处理它，您可以采取一些方法：

添加更多数据

使用数据增强

使用泛化良好的架构

添加正则化（主要是dropout，L1/L2正则化也是可以的）

降低架构复杂性。

为了更清楚，您可以阅读https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d

Answer 2

这是尖叫的迁移学习。 google-unversal-sentence-encoder非常适合这个用例。 将您的 model 替换为

import tensorflow_hub as hub 
import tensorflow_text

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

# this next layer might need some tweaking dimension wise, to correctly fit
# X_train in the model
text_input = tf.keras.layers.Lambda(lambda x: tf.squeeze(x))(text_input)
# conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
# drop21 = Dropout(0.5)(conv2)
# pool1 = MaxPooling1D(pool_size=2)(drop21)
# conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
# drop22 = Dropout(0.5)(conv22)
# pool2 = MaxPooling1D(pool_size=2)(drop22)

# 1) you might need `text_input = tf.expand_dims(text_input, axis=0)` here
# 2) If you're classifying English only, you can use the link to the normal `google-universal-sentence-encoder`, not the multilingual one
# 3) both the English and multilingual have a `-large` version. More accurate but slower to train and infer. 
embedded = hub.KerasLayer('https://tfhub.dev/google/universal-sentence-encoder-multilingual/3')(text_input) 

# this layer seems out of place, 
# dense = Dense(16, activation='relu')(embedded) 

# you don't need to flatten after a dense layer (in your case) or a backbone (in my case (google-universal-sentence-encoder))
# flat = Flatten()(dense)

dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)

Answer 3

我认为，由于您正在进行文本分类，因此添加 1 或 2 个 LSTM 层可能有助于网络更好地学习，因为它将能够更好地与数据的上下文相关联。 我建议在 flatten 层之前添加以下代码。

lstm1 = Bidirectional(LSTM(128, return_sequence = True))
lstm2 = Bidirectional(LSTM(64))

LSTM 层可以帮助神经网络学习某些单词之间的关联，并可能提高网络的准确性。

我还建议删除 Max Pooling 层，因为最大池化尤其是在文本分类中会导致网络丢弃一些有用的功能。 只保留卷积层和 dropout。 还要在展平之前移除 Dense 层并添加上述 LSTM。

Answer 4

目前尚不清楚如何将文本输入 model。 我假设您对文本进行标记以将其表示为整数序列，但是在将其输入 model 之前，您是否使用任何词嵌入？ 如果没有，我建议您在 model 的开头抛出可训练的 tensorflow Embedding层。 有一种称为 Embedding Lookup 的巧妙技术可以加快其训练速度，但您可以将其保存以备后用。 尝试将此层添加到您的 model。 然后，您的Conv1D层将更容易处理一系列浮点数。 另外，我建议你在每个Conv1D之后抛出BatchNormalization ，它应该有助于加速收敛和训练。

文本分类 CNN 过拟合训练

问题描述

4 个解决方案

解决方案1
2 2020-06-15 15:04:46

解决方案2
1 2020-06-23 06:37:29

解决方案3
0 2020-06-22 20:25:52

解决方案4
0 2020-06-23 10:39:06

文本分类 CNN 过拟合训练

问题描述

4 个解决方案

解决方案1 2 2020-06-15 15:04:46

解决方案2 1 2020-06-23 06:37:29

解决方案3 0 2020-06-22 20:25:52

解决方案4 0 2020-06-23 10:39:06

解决方案1
2 2020-06-15 15:04:46

解决方案2
1 2020-06-23 06:37:29

解决方案3
0 2020-06-22 20:25:52

解决方案4
0 2020-06-23 10:39:06