简体繁体 English

在Tensorflow上训练多GPU：更简单的方法？

[英]Training Multi-GPU on Tensorflow: a simpler way?

原文 2016-12-07 23:10:56 7 1 machine-learning/ tensorflow/ gpu

I have been using the training method proposed in the cifar10_multi_gpu_train example for (local) multi-gpu training, ie, creating several towers and then average the gradient. 我一直在使用cifar10_multi_gpu_train示例中提出的训练方法进行（本地）多gpu训练，即创建多个塔然后平均梯度。 However, I was wondering the following: What does happen if I just take the losses coming from the different GPUs, sum them up and then just apply gradient descent to that new loss. 但是，我想知道以下几点：如果我只接受来自不同GPU的损失，将其求和，然后对新损失应用梯度下降，会发生什么情况。

Would that work? 那行得通吗？ Probably this is a silly question, and there must be a limitation somewhere. 可能这是一个愚蠢的问题，并且在某处必须有一个限制。 So I would be happy if you could comment on this. 因此，如果您可以对此发表评论，我将非常高兴。

Thanks and best regards, G. 谢谢您，G.

1 个解决方案

It would not work with the sum. 总和不起作用。 You would get a bigger loss and consequentially bigger and probably erroneous gradients. 您将获得更大的损失，从而导致更大的梯度并且可能是错误的梯度。 While averaging the gradients you get an average of the direction that the weights have to take in order to minimize the loss, but each single direction is the one computed for the exact loss value. 在对梯度求平均时，您将获得权重所采用的方向的平均值，以最大程度地减少损失，但是每个方向都是针对确切损失值计算的。

One thing that you can try is to run the towers independently and then average the weights from time to time, slower convergence rate but faster processing on each node. 您可以尝试的一件事是独立运行塔，然后不时平均权重，收敛速度较慢，但每个节点的处理速度更快。