简体   繁体   中英

Tensorflow Multi-GPU reusing vs. duplicating?

To train a model on multiple GPUs one can create one set of variables on first GPU and reuse them (by tf.variable_scope(tf.get_variable_scope(), reuse=device_num != 0) ) on other GPUs as in cifar10_multi_gpu_train .

But I came across the official CNN benchmarks where in local replicated setting they use new variable scope for each GPU (by tf.variable_scope('v%s' % device_num) ). Since all variables are initialized randomly, post init op is used to copy values from GPU:0 to others.

Both implementations then average gradients on CPU and back-propagate back the result (at least that is what I think since the benchmarks code is cryptic:)) - probably resulting in the same outcome.

What is then the difference between these two approaches and more importantly what is faster?

Thank you.

The difference is that if you're reusing variables every iteration starts with a broadcast of the variables from their original location all GPUs, while if you're copying variables this broadcast is unnecessary, so not sharing should be faster.

The one downside of not sharing is that it's easier for a bug or numerical instability somewhere to lead to different GPUs ending up with different values for each variable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM