简体   繁体   中英

Running SQuAD script using ALBERT (huggingface-transformers)

I have a question regarding the usage of ALBERT with the SQuAD 2.0 huggingface-transformers script.

In the github page, there are no specific instructions in how to run the script using ALBERT, so I used the same specifications used to run the script with BERT. However, the final results achieved are (exact_match = 30.632527583593028, f1 = 36.36948708435092), far from the (f1 = 88.52, exact_match = 81.22) that are achieved by BERT and that are reported on the github page. So I think that I may be doing something wrong.

This is the code that I ran in the command line:

python run_squad.py \
   --model_type albert \
   --model_name_or_path albert-base-v2 \
   --do_train   --do_eval \
   --train_file train-v2.0.json \
   --predict_file dev-v2.0.json \
   --per_gpu_train_batch_size 5 \
   --learning_rate 3e-5 \
   --num_train_epochs 2.0 \
   --max_seq_length 384 \
   --doc_stride 128 \
   --output_dir /aneves/teste2/output/

The only difference between this one and the one from the transformers page is the model_name, in which they use 'bert_base_uncased', and the per_gpu_train_batch_size which is 12 but I had to use 5 due to memory constrains in my GPU.

Am I forgetting some option when I run the script or are the results achieved because of the per_gpu_train_batch_size being set to 5 instead of 12?


You can use gradient accumulation steps to compensate for the small batch size. Essentially, the gradient accumulation step parameter is this:

Let's say you want a batch_size of 64, but your GPU can only fit a batch of size 32.

So you make two passes of 32 batches each, accumulate your gradients, and then do the backward pass after 2 batches.

Secondly, hyperparameters play a humongous role in deep learning models. You will have to try a few sets of parameters to get better accuracy. I think reducing the learning rate to the order of e-6 might help here. Though it is just speculation.

Did you set the flag


to True? Since SQUAD-2.0 contains some questions that do not have an answer, you need to set it to True.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM