[英]How do deep learning frameworks such as PyTorch handle memory when using multiple GPUs?
I have recently run into a situation where I am running out of memory on a single Nvidia V100. 我最近遇到了一个单个Nvidia V100内存不足的情况。 I have limited experience using multiple GPUs to train networks so I'm a little unsure on how the data parallelization process works. 我使用多个GPU来训练网络的经验有限,所以我对数据并行化过程的工作方式有些不确定。 Lets say I'm using a model and batch size that requires something like 20-25GB of memory. 假设我使用的是模型和批量大小,需要20-25GB的内存。 Is there any way to take advantage of the full 32GB of memory I have between two 16GB V100s? 有没有办法充分利用两个16GB V100之间的全部32GB内存? Would PyTorch's DataParallel functionality achieve this? PyTorch的DataParallel功能会实现吗? I suppose there is also the possibility of breaking the model up and using model parallelism as well. 我想也有可能打破模型并使用模型并行性。 Please excuse my lack of knowledge on this subject. 请原谅我对这个问题缺乏了解。 Thanks in advance for any help or clarification! 在此先感谢您的帮助或澄清!
You should keep model parallelism as your last resource and only if your model doesn't fit in the memory of a single GPU (with 16GB/GPU you have plenty of room for a gigantic model). 你应该保持模型并行性作为你的最后一个资源,并且只有你的模型不适合单个GPU的内存(16GB / GPU你有足够的空间容纳一个巨大的模型)。
If you have two GPUs, I would use data parallelism. 如果你有两个GPU,我会使用数据并行。 In data parallelism you have a copy of your model on each GPU and each copy is fed with a batch. 在数据并行性方面,您可以在每个GPU上获得模型的副本,并为每个副本提供批处理。 The gradients are then gathered and used to update the copies. 然后收集渐变并用于更新副本。
Pytorch makes it really easy to achieve data parallelism, as you just need to wrap you model instance in nn.DataParallel
: Pytorch使得实现数据并行化变得非常容易,因为您只需要在nn.DataParallel
包装模型实例:
model = torch.nn.DataParallel(model, device_ids=[0, 1])
output = model(input_var)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.