简体   繁体   English

Tensorflow多GPU重用与复制?

[英]Tensorflow Multi-GPU reusing vs. duplicating?

To train a model on multiple GPUs one can create one set of variables on first GPU and reuse them (by tf.variable_scope(tf.get_variable_scope(), reuse=device_num != 0) ) on other GPUs as in cifar10_multi_gpu_train . 要在多个GPU上训练模型,可以在第一个GPU上创建一组变量,并在其他GPU上重复使用它们(通过tf.variable_scope(tf.get_variable_scope(), reuse=device_num != 0) ),如cifar10_multi_gpu_train

But I came across the official CNN benchmarks where in local replicated setting they use new variable scope for each GPU (by tf.variable_scope('v%s' % device_num) ). 但我遇到了正式的CNN基准测试 ,在本地复制设置中,他们为每个GPU使用新的变量范围(通过tf.variable_scope('v%s' % device_num) )。 Since all variables are initialized randomly, post init op is used to copy values from GPU:0 to others. 由于所有变量都是随机初始化的,因此post init op用于将值从GPU:0复制到其他变量。

Both implementations then average gradients on CPU and back-propagate back the result (at least that is what I think since the benchmarks code is cryptic:)) - probably resulting in the same outcome. 然后两个实现都在CPU上平均渐变并反向传播结果(至少这是我认为,因为基准代码是神秘的:)) - 可能导致相同的结果。

What is then the difference between these two approaches and more importantly what is faster? 那么这两种方法之间有什么区别,更重要的是什么更快?

Thank you. 谢谢。

The difference is that if you're reusing variables every iteration starts with a broadcast of the variables from their original location all GPUs, while if you're copying variables this broadcast is unnecessary, so not sharing should be faster. 不同之处在于,如果您重复使用变量,则每次迭代都会从其原始位置广播变量开始所有GPU,而如果您要复制变量,则此广播是不必要的,因此共享不应该更快。

The one downside of not sharing is that it's easier for a bug or numerical instability somewhere to lead to different GPUs ending up with different values for each variable. 不共享的一个缺点是,某个地方的错误或数值不稳定会导致不同的GPU以每个变量的不同值结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM