简体繁体中英

When training a model over multiple GPUs on the same machine using Pytorch, how is the batch size divided?

原文 2023-01-23 14:45:29 1 1 pytorch/ backpropagation/ multi-gpu

Even looking through Pytorch forums I'm still not certain about this one. Let's say I'm using Pytorch DDP to train a model over 4 GPUs on the same machine .

Suppose I choose a batch size of 8 . Is the model theoretically backpropagating over 2 examples every step and the final results we see are for a model trained with a batch size of 2 , or does the model gather the gradients together at every step to get the result from each GPU and backpropagate with a batch size of 8 ?

1 answers

The actual batch size is the size of input you feed to each worker, in your case is 8. In other words, the BP runs every 8 examples.

A concrete code example: https://gist.github.com/sgraaf/5b0caa3a320f28c27c12b5efeb35aa4c#file-ddp_example-py-L63 . This is the batch size.

Using Multiple GPUs outside of training in PyTorch

HOW (how) does Keras and Pytorch handle the last batch when batch_size is not a multiple of size of training data?

Training multiple pytorch models on GPUs

How to train model with multiple GPUs in pytorch？

Pytorch DataLoader fails when the number of examples are not exactly divided by the batch size

How to reduce model size in Pytorch post training

How to use multiple GPUs in pytorch?

How do deep learning frameworks such as PyTorch handle memory when using multiple GPUs?

Using tensor.share_memory_() vs multiprocessing.Queue in PyTorch when training model across multiple processes

How to leverage the world-size parameter for DistributedDataParallel in Pytorch example for multiple GPUs?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using Multiple GPUs outside of training in PyTorch HOW (how) does Keras and Pytorch handle the last batch when batch_size is not a multiple of size of training data? Training multiple pytorch models on GPUs How to train model with multiple GPUs in pytorch？ Pytorch DataLoader fails when the number of examples are not exactly divided by the batch size How to reduce model size in Pytorch post training How to use multiple GPUs in pytorch? How do deep learning frameworks such as PyTorch handle memory when using multiple GPUs? Using tensor.share_memory_() vs multiprocessing.Queue in PyTorch when training model across multiple processes How to leverage the world-size parameter for DistributedDataParallel in Pytorch example for multiple GPUs?

Related Tags

When training a model over multiple GPUs on the same machine using Pytorch, how is the batch size divided?

Question

1 answers

solution1 0 2023-01-24 18:43:30

solution1
0 2023-01-24 18:43:30