简体繁体 English

评估 pytorch 模型：`with torch.no_grad` 与 `model.eval()`

[英]Evaluating pytorch models: `with torch.no_grad` vs `model.eval()`

原文 2019-04-11 08:16:55 1 3 python/ machine-learning/ deep-learning/ pytorch/ autograd

当我想评估我的模型在验证集上的性能时，是首选with torch.no_grad:还是model.eval() ？

3 个解决方案

TL;DR: TL; DR：

Use both . 使用两者。 They do different things, and have different scopes. 他们做不同的事情，并有不同的范围。

with torch.no_grad - disables tracking of gradients in autograd . with torch.no_grad - 禁用跟踪autograd的渐变。
model.eval() changes the forward() behaviour of the module it is called upon model.eval()更改调用它的模块的forward()行为
- eg, it disables dropout and has batch norm use the entire population statistics 例如，它禁用了丢失并且批量规范使用整个人口统计数据

`with torch.no_grad`

The torch.autograd.no_grad documentation says: torch.autograd.no_grad文档说：

Context-manager that disabled [sic] gradient calculation. 禁用[原文如此]梯度计算的上下文管理器。

Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward() . 当您确定不会调用Tensor.backward()时，禁用渐变计算对于推理非常有用。 It will reduce memory consumption for computations that would otherwise have requires_grad=True . 它将减少计算的内存消耗，否则会有requires_grad=True 。 In this mode, the result of every computation will have requires_grad=False , even when the inputs have requires_grad=True . 在此模式下，即使输入的requires_grad=True ，每次计算的结果都会包含requires_grad=False 。

`model.eval()`

The nn.Module.eval documentation says: nn.Module.eval文档说：

Sets the module in evaluation mode. 将模块设置为评估模式。

This has any effect only on certain modules. 这仅对某些模块有任何影响。 See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, eg Dropout , BatchNorm , etc. 有关其在培训/评估模式下的行为的详细信息，请参阅特定模块的文档，如果它们受到影响，例如Dropout ， BatchNorm等。

The creator of pytorch said the documentation should be updated to suggest the usage of both . pytorch的创建者表示应该更新文档以建议两者的用法。

with torch.no_grad: disables computation of gradients for the backward pass. with torch.no_grad:禁用向后传递的梯度计算。 Since these calculations are unnecessary during inference, and add non-trivial computational overhead, it is essessential to use this context if evaluating the model's speed.由于这些计算在推理过程中是不必要的，并且会增加非平凡的计算开销，因此如果评估模型的速度，则必须使用此上下文。 It will not however affect results.不过不会影响结果。

model.eval() ensures certain modules which behave differently in training vs inference (eg Dropout and BatchNorm) are defined appropriately during the forward pass in inference. model.eval()确保某些在训练与推理中表现不同的模块（例如 Dropout 和 BatchNorm）在推理的前向传递期间被适当地定义。 As such, if your model contains such modules it is essential to enable this.因此，如果您的模型包含此类模块，则必须启用此功能。

For the reasons above it is good practice to use both during inference.由于上述原因，在推理过程中同时使用两者是一种很好的做法。

If you're reading this post because you've been encountering RuntimeError: CUDA out of memory , then with torch.no grad(): will likely to help save the memory.如果您正在阅读这篇文章是因为您遇到了RuntimeError: CUDA out of memory ，那么with torch.no grad():可能有助于节省内存。 Using only model.eval() is unlikely to help with the OOM error.仅使用model.eval()不太可能有助于解决 OOM 错误。

The reason for this is that torch.no grad() disables autograd completely (you can no longer backpropagate), reducing memory consumption and speeding up computations.这样做的原因是torch.no grad()完全禁用了 autograd（您不能再进行反向传播），从而减少内存消耗并加快计算速度。

However, you will still be able to call the gardients when using model.eval() .但是，您仍然可以在使用model.eval()时调用 gardients。 Personally, I find this design decision intriguing.就个人而言，我觉得这个设计决定很有趣。 So, what is the purpose of .eval() ?那么， .eval()的目的是什么？ It seems its main functionality is to deactivate the Dropout during the evaluation time.它的主要功能似乎是在评估期间停用 Dropout。

To summarize, if you use torch.no grad() , no intermediate tensors are saved, and you can possibly increase the batch size in your inference.总而言之，如果您使用torch.no grad() ，则不会保存中间张量，并且您可能会在推理中增加批量大小。