如何解決""RuntimeError: CUDA out of memory."？有沒有辦法釋放更多內存？

Question

在這種情況下，我在 VM 上使用 jupyter notebook 來訓練一些 CNN 模型。 VM 具有 16v CPU 和 60GB 內存。 我只是附加了一個 NVIDIA TESLA P4 以獲得更好的性能。 但它總是給出類似"RuntimeError: CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 7.43 GiB total capacity; 2.20 GiB already allocated; 180.44 MiB free; 226.01 MiB cached)"錯誤"RuntimeError: CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 7.43 GiB total capacity; 2.20 GiB already allocated; 180.44 MiB free; 226.01 MiB cached)"

為什么會發生？ 整個系統都很干凈。 我想知道為什么我只有這么少的可用內存？

我認為 GPU 設置沒有錯誤

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0    22W /  75W |      0MiB /  7611MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Answer 1

當一個進程在 GPU 上分配內存時，該內存只能由該進程或在它終止時釋放。 如果您看到 CUDA 內存不足錯誤，但沒有其他任何東西在運行，那么我建議使用像nvtop這樣的工具來確定誰占用了您的 CUDA 內存。 它看起來像這樣：

在底部，您會看到 GPU 內存和進程命令行。 在上面的示例中，突出顯示的綠色進程占用了 GPU RAM 的 84%。 您可以使用向上/向下箭頭選擇進程並按 F9 終止該進程。 有時，當我運行訓練腳本時，它們不會被終止，而是顯示在這里占用了 CUDA 內存。

注意：在 Ubuntu 18 上有點涉及 nvtop 安裝，但您可以使用的其他工具是gpustat ，它只顯示 pid。

如何解決""RuntimeError: CUDA out of memory."？有沒有辦法釋放更多內存？

問題描述

1 個解決方案

解決方案1
1 2019-12-14 03:46:20

如何解決&quot;&quot;RuntimeError: CUDA out of memory.&quot;？有沒有辦法釋放更多內存？

問題描述

1 個解決方案

解決方案1 1 2019-12-14 03:46:20

如何解決""RuntimeError: CUDA out of memory."？有沒有辦法釋放更多內存？

解決方案1
1 2019-12-14 03:46:20