简体   繁体   中英

When training a model over multiple GPUs on the same machine using Pytorch, how is the batch size divided?

Even looking through Pytorch forums I'm still not certain about this one. Let's say I'm using Pytorch DDP to train a model over 4 GPUs on the same machine .

Suppose I choose a batch size of 8 . Is the model theoretically backpropagating over 2 examples every step and the final results we see are for a model trained with a batch size of 2 , or does the model gather the gradients together at every step to get the result from each GPU and backpropagate with a batch size of 8 ?

The actual batch size is the size of input you feed to each worker, in your case is 8. In other words, the BP runs every 8 examples.

A concrete code example: https://gist.github.com/sgraaf/5b0caa3a320f28c27c12b5efeb35aa4c#file-ddp_example-py-L63 . This is the batch size.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM