[英]Does NVLink accelerate training with DistributedDataParallel?
Nvidia's NVLink accelerates data transfer between several GPUs on the same machine. Nvidia 的NVLink加速了同一台机器上多个 GPU 之间的数据传输。 I train large models on such a machine using PyTorch.
我使用 PyTorch 在这样的机器上训练大型模型。
I see why NVLink would make model-parallel training faster, since one pass through a model will involve several GPUs.我明白为什么 NVLink 会使模型并行训练更快,因为一次通过 model 将涉及多个 GPU。
But would it accelerate a data-parallel training process using DistributedDataParallel ?但它会加速使用DistributedDataParallel的数据并行训练过程吗?
How does data-parallel training on k
GPUs works? k
个 GPU 上的数据并行训练如何工作?
You split your mini batch into k
parts, each part is forwarded on a different GPU, and gradients are estimated on each GPU.您将您的小批量分成
k
个部分,每个部分在不同的 GPU 上转发,并且在每个 GPU 上估计梯度。 However, (and this is super crucial) updating the weights must be synchronized between all GPUs.但是,(这是非常关键的)更新权重必须在所有 GPU 之间同步。 This is where NVLink becomes important for data-parallel training as well.
这也是 NVLink 对于数据并行训练变得重要的地方。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.