简体   繁体   English

Tensorflow 与 CUDA 支持一起运行时崩溃

[英]Tensorflow crashes when running with CUDA support

I had an install of tensorflow 2.4.1 and everything worked but I was having issues with it crashing when trying to use Tensorboard.我安装了 tensorflow 2.4.1,一切正常,但我在尝试使用 Tensorboard 时遇到了崩溃的问题。 I had a look on github issues page and one of the maintainers said that it's a known issue and will be fixed in 2.5.我查看了 github 问题页面,其中一位维护人员说这是一个已知问题,将在 2.5 中修复。

So I went and installed 2.5-rc and then everything broke down.所以我去安装了 2.5-rc,然后一切都崩溃了。 I then tried to downgrade back to 2.4.1 but the issue persisted.然后我尝试降级回 2.4.1,但问题仍然存在。 No other attempts resulted in fixing crashes.没有其他尝试导致修复崩溃。

I went all the way to remove Anaconda install, all Python source folders, CUDA and CuDNN installs, then re-installed all back.我一路删除 Anaconda 安装,所有 Python 源文件夹,CUDA 和 CuDNN 安装,然后重新安装。

As per TF help page I installed TF-2.4.1 with CUDA 11.0 and CuDNN 8.0.根据 TF 帮助页面,我使用 CUDA 11.0 和 CuDNN 8.0 安装了 TF-2.4.1。 It actually worked until I installed CUDA.在我安装 CUDA 之前它确实有效。 Now it crashes every time even when I manually hide CUDA-enable devices.现在,即使我手动隐藏启用 CUDA 的设备,它每次都会崩溃。 This is the output I get:这是我得到的 output:

2021-05-06 21:52:41.777148: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 21:52:41.777782: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-05-06 21:52:41.808730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.402GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-05-06 21:52:41.809006: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 21:52:41.812511: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 21:52:41.812659: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 21:52:41.814721: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 21:52:41.815428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 21:52:41.819782: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 21:52:41.821370: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 21:52:41.822198: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 21:52:41.822393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 21:52:41.822962: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-06 21:52:41.882108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.402GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-05-06 21:52:41.882425: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-05-06 21:52:41.882571: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 21:52:41.882712: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 21:52:41.882855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-05-06 21:52:41.883245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-05-06 21:52:41.883438: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-05-06 21:52:41.883648: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-05-06 21:52:41.883842: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-05-06 21:52:41.884084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-06 21:52:42.347207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-06 21:52:42.347408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-05-06 21:52:42.347499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-05-06 21:52:42.347777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4733 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-05-06 21:52:42.359614: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-06 21:52:42.777774: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-05-06 21:52:42.777956: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-05-06 21:52:42.778158: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2021-05-06 21:52:42.780165: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cupti64_110.dll
2021-05-06 21:52:42.846459: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-05-06 21:52:42.846657: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
Epoch 1/20
2021-05-06 21:52:43.007415: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-06 21:52:43.367381: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-05-06 21:52:43.921448: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-05-06 21:52:43.927001: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll

Process finished with exit code -1073740791 (0xC0000409)

Anyone knows what can be the issue?任何人都知道可能是什么问题?

Almost every version of Tensorflow has it's specific version of CUDA and CUDNN.几乎每个版本的 Tensorflow 都有其特定版本的 CUDA 和 CUDNN。 If you have Tensorflow 2.5, you need CUDA 11.2 and CUDNN 8.1 https://spltech.co.uk/how-to-install-tensorflow-2-5-with-cuda-11-2-and-cudnn-8-1-for-windows-10/ If you have Tensorflow 2.5, you need CUDA 11.2 and CUDNN 8.1 https://spltech.co.uk/how-to-install-tensorflow-2-5-with-cuda-11-2-and-cudnn-8-1 -for-windows-10/

I have this problem before, and there of many issues to cause this error code.我以前有这个问题,并且有很多问题导致这个错误代码。 but

I recommend for install TensorFlow, follow the version that exists in the GPU section in enter link description here我建议安装 TensorFlow,按照 GPU 部分中存在的版本在此处输入链接描述

and you can watch these video enter link description here您可以观看这些视频在此处输入链接描述

Note: before going into reinstalling, try注意:在重新安装之前,请尝试

import tensorflow as tf

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM