简体   繁体   中英

How to clear GPU memory after PyTorch model training without restarting kernel

I am training PyTorch deep learning models on a Jupyter-Lab notebook, using CUDA on a Tesla K80 GPU to train. While doing training iterations, the 12 GB of GPU memory are used. I finish training by saving the model checkpoint, but want to continue using the notebook for further analysis (analyze intermediate results, etc.).

However, these 12 GB continue being occupied (as seen from nvtop ) after finishing training. I would like to free up this memory so that I can use it for other notebooks.

My solution so far is to restart this notebook's kernel, but that is not solving my issue because I can't continue using the same notebook and its respective output computed so far.

The answers so far are correct for the Cuda side of things, but there's also an issue on the ipython side of things.

When you have an error in a notebook environment, the ipython shell stores the traceback of the exception so you can access the error state with %debug . The issue is that this requires holding all variables that caused the error to be held in memory, and they aren't reclaimed by methods like gc.collect() . Basically all your variables get stuck and the memory is leaked.

Usually, causing a new exception will free up the state of the old exception. So trying something like 1/0 may help. However things can get weird with Cuda variables and sometimes there's no way to clear your GPU memory without restarting the kernel.

For more detail see these references:

https://github.com/ipython/ipython/pull/11572

How to save traceback / sys.exc_info() values in a variable?

If you just set object that uses a lot of memory to None like this:

obj = None

And after that you call

gc.collect() # Python thing

This is how you may avoid restarting the notebook.


If you still would like to see it clear from Nvidea smi or nvtop you may run:

torch.cuda.empty_cache() # PyTorch thing

to empty the PyTorch cache.

with pytorch.no_grad():
    torch.cuda.empty_cache()

Never worked with PyTorch myself, but Google has several results which all basically say the same.. torch.cuda.empty_cache()

https://forums.fast.ai/t/clearing-gpu-memory-pytorch/14637

https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530

How to clear Cuda memory in PyTorch

import torch, gc
gc.collect()
torch.cuda.empty_cache()

如果你有一个名为model的变量,你可以尝试释放它在 GPU 上占用的内存(假设它在 GPU 上),首先释放对del model使用的内存的引用,然后调用torch.cuda.empty_cache()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM