简体   繁体   中英

Validation accuracy is much less than Training accuracy

I am using MOSI dataset for the multimodal sentiment analysis, where for now I am training the model for text dataset only. For text, I am using glove embeddings of 300 dimensions for processing text. My total vocab size is 2173 and my padded sequence length is 30. My target array is [0,0,0,0,0,0,1] where left most is highly -ve and right most highly +ve.

I am splitting the dataset like this

X_train, X_test, y_train, y_test = train_test_split(WDatasetX, y7, test_size=0.20, random_state=42)

My tokenization process is

MAX_NB_WORDS = 3000
tokenizer = Tokenizer(num_words=MAX_NB_WORDS,oov_token = "OOV")
tokenizer.fit_on_texts(Text_X_Train)
tokenized_X_train = tokenizer.texts_to_sequences(Text_X_Train)
tokenized_X_test = tokenizer.texts_to_sequences(Text_X_Test)

My embedding matrix:

vocab_size = len(tokenizer.word_index)+1
emb_mean=0
def embedding_matrix_filteration():
    all_embs = np.stack(list(embeddings_index.values()))
    print(all_embs.shape)
    emb_mean, emb_std = np.mean(all_embs), np.std(all_embs)
    print(emb_mean)
    embedding_matrix = np.random.normal(emb_mean, emb_std, (vocab_size, embed_dim)) gives the matrix of specified
                                                                    size filled with values from gauss distribution
    print(embedding_matrix.shape)
     print("length of word2id:",len(word2id))
    embeddedCount = 0
    not_found = []
    for word, idx in tokenizer.word_index.items():
        embedding_vector = embeddings_index.get(word.lower())
        if word == ' ':
            embedding_vector = np.zeros_like(emb_mean)
        if embedding_vector is not None: 
            embedding_matrix[idx] = embedding_vector
            embeddedCount += 1
        else:
            print(word)
            print("$$$")
    print('total embedded:',embeddedCount,'common words')# words common between glove vector and wordset
    print("length of word2id:",len(word2id))
    print(len(embedding_matrix))
    return embedding_matrix

emb = embedding_matrix_filteration()

Model Architecture:

Embedding Layer:

embedding_layer = Embedding(
    vocab_size,
    300,
    weights=[emb],
    trainable=False,
    input_length=sequence_length
)

My model:

from keras import regularizers,layers

model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(256,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(256)))#kernel_regularizer=regularizers.l2(0.001)
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))

For some reason when my training accuracy reached 80%, val. accuracy still remains very low. I have tried different regularization techniques, optimizers, loss functions, but the result is the same. I don't know why.

在此处输入图像描述

Please Help!!

Edit: The total no. of tokens are 2719 and the total no.of sentences (including test and train dataset ) are 2183.

Compiler: model.compile(optimizer='adam',         
loss='mean-squred-error',
metrics=['accuracy']
)

UPDATED STATS:

I have decreased the label size from 7 to 3 ie [0,1,0] -> +ve, neutral,-ve.

model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(16,activation='relu'))) 
model.add(Dropout(0.2))
model.add(Dense(3, activation='softmax'))

Compiled:

model.compile( 
optimizer=keras.optimizers.Adam(learning_rate=0.00005),
              loss='categorical_crossentropy',
              metrics = ['accuracy'])

Graphs: 在此处输入图像描述

Training: 在此处输入图像描述

But loss is still high and Also, I have stratified the dataset.

A couple of recommendations:

  1. Use categorical_crossentropy instead of mean_squared_error , it can help you a lot when doing classification (although the latter could also work, the former also does it better).
  2. Are all your labels mutually exclusive? If then, use softmax + categorical_crossentropy , otherwise (eg label appears like [1,0,0,0,0,0,1] use sigmoid + binary_crossentropy .
  3. Decrease the size of the model initially, and only if the overfitting problem persists use Dropout() . Use only one layer of LSTM.
  4. Reduce the number of units (even if you have one single LSTM cell ( 64 / 128 would probably suffice).
  5. You can use bidirectional LSTM (I would even opt for bidirectional GRUs since they are simpler, to see how the performance behaves).
  6. Ensure that you do a stratified split (in this way, certain examples definitely appear both in the training set and in the validation set, also keeping a good proportion.
  7. Start with a small(er) learning rate ( 0.0001 / 0.00005 ).
  8. Establish an objective/correct baseline. If your data is very little, particularly when working on a multi-modal dataset(you fetch only the "text"), you work only on text, with 7 different classes, then it is probable you will not reach a very high accuracy.

Bear in mind that, in order to have a reasonable final result in your case, you need to employ a data-centric approach, rather than a model-centric one. Regardless of the possible improvements, if the data is scarce + not comprehensive, you will not be able to achieve great results.

A large difference between Train and Validation stats typically indicates overfitting of models to the Train data.

To minimize this I do a few things

  1. reduce the size of the model.
  2. Add a few dropout or similar layers in the model. I have had good success with using these layers: layers.LeakyReLU(alpha=0.8),

See guidance here: https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#strategies_to_prevent_overfitting

How long is your dataset (how many sentences), 2179 tokens does not seems like much, it seems to me like your model is way too big for the task. I wouldn't add 4 layers of LSTM, I would go with 1 or 2.

from keras import regularizers,layers

model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(64,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(32)))
model.add(Dense(16, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))

As for the training, 200 epoch seems long, if your model desn't seem to converge after 20 I would reset and try with a simpler architecture.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM