简体繁体 English

Tensorflow多GPU重用与复制？

[英]Tensorflow Multi-GPU reusing vs. duplicating?

原文 2018-05-10 16:56:25 5 1 python/ tensorflow

To train a model on multiple GPUs one can create one set of variables on first GPU and reuse them (by tf.variable_scope(tf.get_variable_scope(), reuse=device_num != 0) ) on other GPUs as in cifar10_multi_gpu_train . 要在多个GPU上训练模型，可以在第一个GPU上创建一组变量，并在其他GPU上重复使用它们（通过tf.variable_scope(tf.get_variable_scope(), reuse=device_num != 0) ），如cifar10_multi_gpu_train 。

But I came across the official CNN benchmarks where in local replicated setting they use new variable scope for each GPU (by tf.variable_scope('v%s' % device_num) ). 但我遇到了正式的CNN基准测试，在本地复制设置中，他们为每个GPU使用新的变量范围（通过tf.variable_scope('v%s' % device_num) ）。 Since all variables are initialized randomly, post init op is used to copy values from GPU:0 to others. 由于所有变量都是随机初始化的，因此post init op用于将值从GPU:0复制到其他变量。

Both implementations then average gradients on CPU and back-propagate back the result (at least that is what I think since the benchmarks code is cryptic:)) - probably resulting in the same outcome. 然后两个实现都在CPU上平均渐变并反向传播结果（至少这是我认为，因为基准代码是神秘的:)） - 可能导致相同的结果。

What is then the difference between these two approaches and more importantly what is faster? 那么这两种方法之间有什么区别，更重要的是什么更快？

Thank you. 谢谢。

1 个解决方案

The difference is that if you're reusing variables every iteration starts with a broadcast of the variables from their original location all GPUs, while if you're copying variables this broadcast is unnecessary, so not sharing should be faster. 不同之处在于，如果您重复使用变量，则每次迭代都会从其原始位置广播变量开始所有GPU，而如果您要复制变量，则此广播是不必要的，因此共享不应该更快。

The one downside of not sharing is that it's easier for a bug or numerical instability somewhere to lead to different GPUs ending up with different values for each variable. 不共享的一个缺点是，某个地方的错误或数值不稳定会导致不同的GPU以每个变量的不同值结束。