簡體   English   中英

Tensorflow 與 CUDA 支持一起運行時崩潰

[英]Tensorflow crashes when running with CUDA support

我安裝了 tensorflow 2.4.1,一切正常,但我在嘗試使用 Tensorboard 時遇到了崩潰的問題。 我查看了 github 問題頁面,其中一位維護人員說這是一個已知問題,將在 2.5 中修復。

所以我去安裝了 2.5-rc,然后一切都崩潰了。 然后我嘗試降級回 2.4.1,但問題仍然存在。 沒有其他嘗試導致修復崩潰。

我一路刪除 Anaconda 安裝,所有 Python 源文件夾,CUDA 和 CuDNN 安裝,然后重新安裝。

根據 TF 幫助頁面,我使用 CUDA 11.0 和 CuDNN 8.0 安裝了 TF-2.4.1。 在我安裝 CUDA 之前它確實有效。 現在,即使我手動隱藏啟用 CUDA 的設備,它每次都會崩潰。 這是我得到的 output:

2021-05-06 21:52:41.777148: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 21:52:41.777782: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-05-06 21:52:41.808730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.402GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-05-06 21:52:41.809006: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 21:52:41.812511: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 21:52:41.812659: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 21:52:41.814721: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 21:52:41.815428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 21:52:41.819782: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 21:52:41.821370: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 21:52:41.822198: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 21:52:41.822393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 21:52:41.822962: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-06 21:52:41.882108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.402GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-05-06 21:52:41.882425: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 21:52:41.882571: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 21:52:41.882712: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 21:52:41.882855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 21:52:41.883245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 21:52:41.883438: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 21:52:41.883648: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 21:52:41.883842: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 21:52:41.884084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 21:52:42.347207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-06 21:52:42.347408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-05-06 21:52:42.347499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-05-06 21:52:42.347777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4733 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-05-06 21:52:42.359614: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 21:52:42.777774: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-05-06 21:52:42.777956: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-05-06 21:52:42.778158: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2021-05-06 21:52:42.780165: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cupti64_110.dll
2021-05-06 21:52:42.846459: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-05-06 21:52:42.846657: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
Epoch 1/20
2021-05-06 21:52:43.007415: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-06 21:52:43.367381: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 21:52:43.921448: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 21:52:43.927001: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll

Process finished with exit code -1073740791 (0xC0000409)

任何人都知道可能是什么問題?

幾乎每個版本的 Tensorflow 都有其特定版本的 CUDA 和 CUDNN。 If you have Tensorflow 2.5, you need CUDA 11.2 and CUDNN 8.1 https://spltech.co.uk/how-to-install-tensorflow-2-5-with-cuda-11-2-and-cudnn-8-1 -for-windows-10/

我以前有這個問題,並且有很多問題導致這個錯誤代碼。

我建議安裝 TensorFlow,按照 GPU 部分中存在的版本在此處輸入鏈接描述

您可以觀看這些視頻在此處輸入鏈接描述

注意:在重新安裝之前,請嘗試

import tensorflow as tf

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM