简体   繁体   中英

torch.no_grad() affects on model accuracy

I am getting an error "CUDA out of memory" then i add torch.no_grad() function into my code. Is it affect on my accuracy?

for iters in range(args.iterations):

with torch.no_grad():
    encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
    res, encoder_h_1, encoder_h_2, encoder_h_3)

with torch.no_grad():
    code = binarizer(encoded)

with torch.no_grad():
    output, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4 = decoder(
    code, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4)

res = res - output.detach()
codes.append(code.data.cpu().numpy())
torch.cuda.empty_cache()
print('Iter: {:02d}; Loss: {:.06f}'.format(iters, res.data.abs().mean()))

torch.no_grad() just disables the tracking of any calculations required to later calculate a gradient.

It won't have any effect on accuracy in a pure inference mode, since gradients are not needed there. Of course you can't use it during training time since we need the gradients to train and optimize.

In general if you go for inference you always want to set the network to eval mode and disable gradients. This saves run time and memory consumption and won't affect accuracy.

Answer to a similar questions, explaining eval() and no_grad() https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615/2

torch.no_grad() basically skips the gradient calculation over the weights. That means you are not changing any weight in the specified layers. If you are trainin pre-trained model, it's ok to use torch.no_grad() on all the layers except fully connected layer or classifier layer.

If you are training your network from scratch this isn't a good thing to do. You should consider to reduce layers or apply torch.no_grad() part of the training. An example for this is give below.

for iters in range(args.iterations):

if iters % 2 == 0:
    with torch.no_grad():
        encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
        res, encoder_h_1, encoder_h_2, encoder_h_3)
else:
    with torch.no_grad():
        encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
        res, encoder_h_1, encoder_h_2, encoder_h_3)

This is a short example. It might make your training time a bit longer but you will be able to train your network without reducing layers. The important thing here that you shouldn't update all layer at each iteration or epoch. Some part of the network should be updated on a specified frequency. Note: This is an experimental method

According to the PyTorch docs:

Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True.

So it depends on what you are planning to do. If you are training your model then yes it would affect your accuracy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM