简体   繁体   中英

What is the default batch size of pytorch SGD?

What does pytorch SGD do if I feed the whole data and do not specify the batch size? I don't see any "stochastic" or "randomness" in the case. For example, in the following simple code, I feed the whole data (x,y) into a model.

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)  
for epoch in range(5):
    y_pred = model(x_data)
    loss = criterion(y_pred, y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Suppose there are 100 data pairs (x,y) , ie x_data and y_data each has 100 elements.

Question: It seems to me that all the 100 gradients are calculated before one update of parameters. Size of a "mini_batch" is 100, not 1. So there is no randomness, am I right? At first, I think SGD means randomly choose 1 data point and calculate its gradient, which will be used as an approximation of the true gradient from all data.

The SGD optimizer in PyTorch is just gradient descent. The stocastic part comes from how you usually pass a random subset of your data through the network at a time (ie a mini-batch or batch). The code you posted passes the entire dataset through on each epoch before doing backprop and stepping the optimizer so you're really just doing regular gradient descent.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM