torch.manual_seed(seed) get RuntimeError: CUDA 錯誤：觸發設備端斷言

Question

當我收到此錯誤時，我正在使用 GOOGLE COLAB。 這是我的代碼，我沒有發現任何錯誤，這些代碼幾個小時前是正確的，但突然出錯了，我不知道為什么

import torch
if torch.cuda.is_available():       
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
seed=1
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True

錯誤是

There are 1 GPU(s) available.
We will use the GPU: Tesla P100-PCIE-16GB
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-121-436d9d8bb120> in <module>()
      9 seed=1
     10 np.random.seed(seed)
---> 11 torch.manual_seed(seed)
     12 torch.cuda.manual_seed_all(seed)
     13 torch.backends.cudnn.deterministic = True

3 frames
/usr/local/lib/python3.7/dist-packages/torch/cuda/random.py in cb()
    109         for i in range(device_count()):
    110             default_generator = torch.cuda.default_generators[i]
--> 111             default_generator.manual_seed(seed)
    112 
    113     _lazy_call(cb, seed_all=True)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

誰能幫幫我？

Answer 1

根據我的經驗，此錯誤可能是由於目標中的標簽數量與 model 中的類數量之間存在某種不一致而導致的。

要解決它，您可以嘗試：

確保目標數據中的 label 從 0 開始。如果數據中有 n 個類，則目標類應為 [0, 1, 2,..., n-1]
確保您使用的 model 設置為使用 n 類

torch.manual_seed(seed) get RuntimeError: CUDA 錯誤：觸發設備端斷言

問題描述

1 個解決方案

解決方案1
1 已采納 2022-04-19 09:03:11

torch.manual_seed(seed) get RuntimeError: CUDA 錯誤：觸發設備端斷言

問題描述

1 個解決方案

解決方案1 1 已采納 2022-04-19 09:03:11

解決方案1
1 已采納 2022-04-19 09:03:11