简体   繁体   English

NVLink 是否使用 DistributedDataParallel 加速训练?

[英]Does NVLink accelerate training with DistributedDataParallel?

Nvidia's NVLink accelerates data transfer between several GPUs on the same machine. Nvidia 的NVLink加速了同一台机器上多个 GPU 之间的数据传输。 I train large models on such a machine using PyTorch.我使用 PyTorch 在这样的机器上训练大型模型。

I see why NVLink would make model-parallel training faster, since one pass through a model will involve several GPUs.我明白为什么 NVLink 会使模型并行训练更快,因为一次通过 model 将涉及多个 GPU。

But would it accelerate a data-parallel training process using DistributedDataParallel ?但它会加速使用DistributedDataParallel的数据并行训练过程吗?

How does data-parallel training on k GPUs works? k个 GPU 上的数据并行训练如何工作?
You split your mini batch into k parts, each part is forwarded on a different GPU, and gradients are estimated on each GPU.您将您的小批量分成k个部分,每个部分在不同的 GPU 上转发,并且在每个 GPU 上估计梯度。 However, (and this is super crucial) updating the weights must be synchronized between all GPUs.但是,(这是非常关键的)更新权重必须在所有 GPU 之间同步。 This is where NVLink becomes important for data-parallel training as well.这也是 NVLink 对于数据并行训练变得重要的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM