Even looking through Pytorch forums I'm still not certain about this one. Let's say I'm using Pytorch DDP to train a model over 4
GPUs on the same machine .
Suppose I choose a batch size of 8
. Is the model theoretically backpropagating over 2
examples every step and the final results we see are for a model trained with a batch size of 2
, or does the model gather the gradients together at every step to get the result from each GPU and backpropagate with a batch size of 8
?
The actual batch size is the size of input you feed to each worker, in your case is 8. In other words, the BP runs every 8 examples.
A concrete code example: https://gist.github.com/sgraaf/5b0caa3a320f28c27c12b5efeb35aa4c#file-ddp_example-py-L63 . This is the batch size.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.