I'm fine-tuning ResNet50 on the CIFAR10 dataset using tf.slim's train_image_classifier.py script:
python train_image_classifier.py \
--train_dir=${TRAIN_DIR}/all \
--dataset_name=cifar10 \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--checkpoint_path=${TRAIN_DIR} \
--model_name=resnet_v1_50 \
--max_number_of_steps=3000 \
--batch_size=32 \
--num_clones=4 \
--learning_rate=0.0001 \
--save_interval_secs=10 \
--save_summaries_secs=10 \
--log_every_n_steps=10 \
--optimizer=sgd
For 3k steps, running this on a single GPU (Tesla M40) takes around 30mn, while running on 4 GPUs takes 50+ mn. (The accuracy is similar in both cases: ~75% and ~78%).
I know that one possible cause of delay in multi-GPU setups is loading the images, but in the case of tf.slim, it uses the CPU for that. Any ideas of what could be the issue? Thank you!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.