驗證精度遠低於訓練精度

Question

我正在使用MOSI數據集進行多模態情感分析，目前我只為文本數據集訓練 model。 對於文本，我使用300維的手套嵌入來處理文本。 我的總詞匯量是 2173，我的填充序列長度是 30。我的目標數組是[0,0,0,0,0,0,1] ，其中最左邊是高 -ve，最右邊是 +ve。

我正在像這樣拆分數據集

X_train, X_test, y_train, y_test = train_test_split(WDatasetX, y7, test_size=0.20, random_state=42)

我的標記化過程是

MAX_NB_WORDS = 3000
tokenizer = Tokenizer(num_words=MAX_NB_WORDS,oov_token = "OOV")
tokenizer.fit_on_texts(Text_X_Train)
tokenized_X_train = tokenizer.texts_to_sequences(Text_X_Train)
tokenized_X_test = tokenizer.texts_to_sequences(Text_X_Test)

我的嵌入矩陣：

vocab_size = len(tokenizer.word_index)+1
emb_mean=0
def embedding_matrix_filteration():
    all_embs = np.stack(list(embeddings_index.values()))
    print(all_embs.shape)
    emb_mean, emb_std = np.mean(all_embs), np.std(all_embs)
    print(emb_mean)
    embedding_matrix = np.random.normal(emb_mean, emb_std, (vocab_size, embed_dim)) gives the matrix of specified
                                                                    size filled with values from gauss distribution
    print(embedding_matrix.shape)
     print("length of word2id:",len(word2id))
    embeddedCount = 0
    not_found = []
    for word, idx in tokenizer.word_index.items():
        embedding_vector = embeddings_index.get(word.lower())
        if word == ' ':
            embedding_vector = np.zeros_like(emb_mean)
        if embedding_vector is not None: 
            embedding_matrix[idx] = embedding_vector
            embeddedCount += 1
        else:
            print(word)
            print("$$$")
    print('total embedded:',embeddedCount,'common words')# words common between glove vector and wordset
    print("length of word2id:",len(word2id))
    print(len(embedding_matrix))
    return embedding_matrix

emb = embedding_matrix_filteration()

Model 架構：

嵌入層：

embedding_layer = Embedding(
    vocab_size,
    300,
    weights=[emb],
    trainable=False,
    input_length=sequence_length
)

我的model：

from keras import regularizers,layers

model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(256,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(256)))#kernel_regularizer=regularizers.l2(0.001)
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))

出於某種原因，當我的訓練准確率達到 80% 時，val。 准確性仍然很低。 我嘗試了不同的正則化技術、優化器、損失函數，但結果是一樣的。 我不知道為什么。

請幫忙！！

編輯：總數。 代幣數量為 2719，總數為句子（包括測試和訓練數據集）為 2183。

Compiler: model.compile(optimizer='adam',         
loss='mean-squred-error',
metrics=['accuracy']
)

更新統計：

我已將 label 的大小從 7 減小到 3，即 [0,1,0] -> +ve, neutral,-ve。

model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(16,activation='relu'))) 
model.add(Dropout(0.2))
model.add(Dense(3, activation='softmax'))

編譯：

model.compile( 
optimizer=keras.optimizers.Adam(learning_rate=0.00005),
              loss='categorical_crossentropy',
              metrics = ['accuracy'])

圖表：

訓練：

但損失仍然很高，而且，我對數據集進行了分層。

Answer 1

一些建議：

使用categorical_crossentropy而不是mean_squared_error ，它可以在分類時幫助你很多（雖然后者也可以，但前者也做得更好）。
你所有的標簽都是相互排斥的嗎？ 如果然后，使用softmax + categorical_crossentropy ，否則（例如 label 看起來像[1,0,0,0,0,0,1]使用sigmoid + binary_crossentropy 。
最初減小 model 的大小，並且僅當過度擬合問題仍然存在時才使用Dropout() 。 只使用一層 LSTM。
減少單元數量（即使你只有一個 LSTM 單元（ 64 / 128可能就足夠了）。
您可以使用雙向 LSTM（我什至會選擇雙向 GRU，因為它們更簡單，以查看性能表現如何）。
確保你做了stratified split （這樣，某些例子肯定會出現在訓練集中和驗證集中，並且保持良好的比例。
從較小的（呃）學習率 ( 0.0001 / 0.00005 ) 開始。
建立客觀/正確的基線。 如果您的數據非常少，特別是在處理多模式數據集時（您只獲取“文本”），您只處理具有 7 個不同類別的文本，那么您很可能無法達到很高的准確性。

請記住，為了在您的案例中獲得合理的最終結果，您需要采用以數據為中心的方法，而不是以模型為中心的方法。 不管可能的改進如何，如果數據稀缺+不全面，你將無法取得很好的結果。

Answer 2

訓練和驗證統計數據之間的巨大差異通常表明模型對訓練數據過度擬合。

為了盡量減少這種情況，我做了一些事情

減小 model 的大小。
在 model 中添加一些 dropout 或類似層。我使用這些層取得了很好的成功： layers.LeakyReLU(alpha=0.8),

請參閱此處的指南： https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#strategies_to_prevent_overfitting

Answer 3

你的數據集有多長（多少句子），2179 個標記似乎並不多，在我看來你的 model 對於任務來說太大了。 我不會添加 4 層 LSTM，我會添加 1 層或 2 層 go。

from keras import regularizers,layers

model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(64,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(32)))
model.add(Dense(16, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))

至於訓練，200 個紀元似乎很長，如果您的 model 在 20 個紀元后似乎沒有收斂，我會重置並嘗試使用更簡單的架構。

驗證精度遠低於訓練精度

問題描述

Model 架構：

更新統計：

3 個解決方案

解決方案1
4 已采納 2021-08-19 11:00:35

解決方案2
3 2021-08-11 19:22:10

解決方案3
2 2021-08-12 12:14:36

驗證精度遠低於訓練精度

問題描述

Model 架構：

更新統計：

3 個解決方案

解決方案1 4 已采納 2021-08-19 11:00:35

解決方案2 3 2021-08-11 19:22:10

解決方案3 2 2021-08-12 12:14:36

解決方案1
4 已采納 2021-08-19 11:00:35

解決方案2
3 2021-08-11 19:22:10

解決方案3
2 2021-08-12 12:14:36