简体繁体中英

Tensorflow Multi-GPU reusing vs. duplicating?

原文 2018-05-10 16:56:25 8 1 python/ tensorflow

To train a model on multiple GPUs one can create one set of variables on first GPU and reuse them (by tf.variable_scope(tf.get_variable_scope(), reuse=device_num != 0) ) on other GPUs as in cifar10_multi_gpu_train .

But I came across the official CNN benchmarks where in local replicated setting they use new variable scope for each GPU (by tf.variable_scope('v%s' % device_num) ). Since all variables are initialized randomly, post init op is used to copy values from GPU:0 to others.

Both implementations then average gradients on CPU and back-propagate back the result (at least that is what I think since the benchmarks code is cryptic:)) - probably resulting in the same outcome.

What is then the difference between these two approaches and more importantly what is faster?

Thank you.

1 answers

The difference is that if you're reusing variables every iteration starts with a broadcast of the variables from their original location all GPUs, while if you're copying variables this broadcast is unnecessary, so not sharing should be faster.

The one downside of not sharing is that it's easier for a bug or numerical instability somewhere to lead to different GPUs ending up with different values for each variable.

Tensorflow Multi-GPU loss

Tensorflow Multi-GPU - NCCL

TensorFlow MirroredStrategy() not working for multi-gpu training

How Tensorflow GPU/multi-GPU allocates memory?

Choiche GPU tensorflow-directml or multi-gpu

How to implement batch normalization layer for tensorflow multi-GPU code

tensorflow slim multi-GPU can't work

TensorFlow: Is it possible to restore checkpoint models for multi-gpu training?

Tensorflow (GPU) vs. Numpy

Pytorch Multi-GPU Issue

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Tensorflow Multi-GPU loss Tensorflow Multi-GPU - NCCL TensorFlow MirroredStrategy() not working for multi-gpu training How Tensorflow GPU/multi-GPU allocates memory? Choiche GPU tensorflow-directml or multi-gpu How to implement batch normalization layer for tensorflow multi-GPU code tensorflow slim multi-GPU can't work TensorFlow: Is it possible to restore checkpoint models for multi-gpu training? Tensorflow (GPU) vs. Numpy Pytorch Multi-GPU Issue

Related Tags

Tensorflow Multi-GPU reusing vs. duplicating?

Question

1 answers

solution1 0 2018-07-18 16:51:17

solution1
0 2018-07-18 16:51:17