I have a question regarding the usage of ALBERT with the SQuAD 2.0 huggingface-transformers script.
In the github page, there are no specific instructions in how to run the script using ALBERT, so I used the same specifications used to run the script with BERT. However, the final results achieved are (exact_match = 30.632527583593028, f1 = 36.36948708435092), far from the (f1 = 88.52, exact_match = 81.22) that are achieved by BERT and that are reported on the github page. So I think that I may be doing something wrong.
This is the code that I ran in the command line:
python run_squad.py \
--model_type albert \
--model_name_or_path albert-base-v2 \
--do_train --do_eval \
--train_file train-v2.0.json \
--predict_file dev-v2.0.json \
--per_gpu_train_batch_size 5 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /aneves/teste2/output/
The only difference between this one and the one from the transformers page is the model_name, in which they use 'bert_base_uncased', and the per_gpu_train_batch_size which is 12 but I had to use 5 due to memory constrains in my GPU.
Am I forgetting some option when I run the script or are the results achieved because of the per_gpu_train_batch_size being set to 5 instead of 12?
Thanks!
You can use gradient accumulation steps to compensate for the small batch size. Essentially, the gradient accumulation step parameter is this:
Let's say you want a batch_size of 64, but your GPU can only fit a batch of size 32.
So you make two passes of 32 batches each, accumulate your gradients, and then do the backward pass after 2 batches.
Secondly, hyperparameters play a humongous role in deep learning models. You will have to try a few sets of parameters to get better accuracy. I think reducing the learning rate to the order of e-6 might help here. Though it is just speculation.
Did you set the flag
--version_2_with_negative
to True? Since SQUAD-2.0 contains some questions that do not have an answer, you need to set it to True.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.