简体   繁体   中英

How to improve an accuracy of validation and test of my model of Transfer Learning BERT

I trained my BERT model, then I get 99% in the training part whoever in part of validation I get just 80%, so how can I improve my validation accuracy?

Code:

def build_model(self, n_categories):
    input_word_ids = tf.keras.Input(shape=(self.MAX_LEN,), dtype=tf.int32, name='input_word_ids')
    input_mask = tf.keras.Input(shape=(self.MAX_LEN,), dtype=tf.int32, name='input_mask')
    input_type_ids = tf.keras.Input(shape=(self.MAX_LEN,), dtype=tf.int32, name='input_type_ids')

    # Import RoBERTa model from HuggingFace
    #roberta_model = TFRobertaModel.from_pretrained(self.MODEL_NAME, num_labels = n_categories, output_attentions = False, output_hidden_states = False)
    roberta_model = TFBertModel.from_pretrained(self.MODEL_NAME, num_labels = n_categories, output_attentions = True, output_hidden_states = True)
    
    # for layer in roberta_model.layers[:-15]:
    #   layer.trainable = False

    x = roberta_model(input_word_ids, attention_mask=input_mask, token_type_ids=input_type_ids)

    # Huggingface transformers have multiple outputs, embeddings are the first one,
    # so let's slice out the first position
    x = x[0]

    x = tf.keras.layers.Dropout(0.1)(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(256, activation='relu')(x)
    x = tf.keras.layers.Dense(n_categories, activation='softmax')(x)

    model = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=x)
    model.compile(optimizer=tf.keras.optimizers.Adam(lr=1e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    return model

Based on the information you've provided it seems that your model is overfitting . Achieving a 99% accuracy on the training set and a significantly lower accuracy on the validation set indicates that the model is simply memorizing the training data and thus performing poorly on the validation set.

The first two hyper-parameters that I would consider tuning, in this case, are the number of epochs and the learning rate. Your initial goal should be to achieve a similar accuracy on both the training and validation set, even if it is only 80% or so. This generally means that you should lower the number of epochs until you're seeing roughly the same accuracy.

描绘过拟合的图表

In this chart, the blue line is the training acc, the red line is the validation acc and the x axis represents the number of epochs. You can see that training acc continues to decrease, even as the validation acc begins to increase (where the warning sign is). Ideally, you should stop training at the epoch under the warning.

From there you can begin to tune other parameters of the model, such as any available optimizer and regularization params.

Also, it is not clear from your question whether or not you are using a test set. It's advisable to split your data into three partitions ( train, validation and testing ). Test data should be used only during training though, independently after the model has been trained.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM