简体   繁体   English

当要求它适合模型时,Tensorflow 崩溃

[英]Tensorflow crashes when ask it to fit model

Tensorflow on gpu new to me, first naive question is, am I correct in assuming that I can use a gpu (nv gtx 1660ti) to run tensorflow ml operations, while it simultaneously runs my monitor? gpu 上的 Tensorflow 对我来说是新手,第一个天真的问题是,我是否正确假设我可以使用 gpu (nv gtx 1660ti) 运行 tensorflow ml 操作,同时它同时运行我的监视器? Only have one gpu card in my pc, assume it can do both at the same time or do I require a dedicated gpu for tensorflow only, that is not connected to any monitor?我的电脑里只有一张 gpu 卡,假设它可以同时做这两个,还是我只需要一个专用的 gpu 用于 tensorflow,没有连接到任何显示器?

All on ubuntu 21.10, have set up nvidia-toolkit, cudnn, tensorflow, tensorflow-gpu in a conda env, all appears to work fine: 1 gpu visible, built with cudnn 11.6.r11.6, tf version 2.8.0, python version 3.7.10 all in conda env running on a jupyter notebook.所有在 ubuntu 21.10 上,都在 conda env 中设置了 nvidia-toolkit、cudnn、tensorflow、tensorflow-gpu,一切似乎都工作正常:1 gpu 可见,用 cudnn 11.6.r11.6 构建,tf 版本 2.8.0,python版本 3.7.10 全部在 conda env 中运行在 jupyter 笔记本上。 All seems to run fine until I attempt to train a model and then I get this error message:一切似乎都运行良好,直到我尝试训练模型然后我收到以下错误消息:

2022-03-19 04:42:48.005029: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8302

and then the kernel just locks up and crashes.然后内核只是锁定并崩溃。 BTW the code worked prior to installing gpu, when it simply used cpu.顺便说一句,代码在安装 gpu 之前工作,当时它只是使用 cpu。 Is this simply a version mismatch somewhere between python, tensorflow, tensorflow-gpu, cudnn versions or something more sinister?这仅仅是 python、tensorflow、tensorflow-gpu、cudnn 版本之间的版本不匹配还是更险恶的版本? Thx.谢谢。 J. J。

am I correct in assuming that I can use a GPU (nv gtx 1660ti) to run tensorflow ml operations, while it simultaneously runs my monitor?我是否正确假设我可以使用 GPU (nv gtx 1660ti) 运行 tensorflow ml 操作,同时它同时运行我的显示器?

Yes, you can check with nvidia-smi on ubuntu to see how much free memory you have or which processes are using GPU.是的,您可以在 ubuntu 上使用nvidia-smi查看您有多少可用内存或哪些进程正在使用 GPU。

Only have one GPU card in my pc, assume it can do both at the same?我的电脑里只有一个 GPU 卡,假设它可以同时做这两个? time时间

Yes, It can.是的,它可以。 Most people do the same, a training process on GPU is just similar to running a game, (but more memory hungry)大多数人都这样做,在 GPU 上的训练过程就像运行游戏一样,(但更需要内存)

About the problem:关于问题:

install based on this version table.根据版本表安装。

check your driver version with nvidia-smi But, for true Cuda version check this nvcc -V ( the Cuda version in nvidia-smi is actually max supported Cuda version. )使用nvidia-smi检查您的驱动程序版本但是,对于真正的 Cuda 版本,请检查此nvcc -Vnvidia-smi中的 Cuda 版本实际上是最大支持的 Cuda 版本。)

just install pip install tensorflow-gpu this will also install keras for you.只需安装pip install tensorflow-gpu这也将为您安装 keras。

check if tensorflow has access to GPU as follow:检查 tensorflow 是否可以访问 GPU,如下所示:

import tensorflow as tf
tf.test.is_gpu_available() #should return True 
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

install based on this version table.根据此版本表安装。

That was the key for me.这对我来说是关键。 Had the same issue , CPU worked fine, GPU would dump out during model fit with an exit code but no error.有同样的问题,CPU工作正常,GPU会在模型拟合期间转储退出代码但没有错误。 The matrix will show you that tensorflow 2.5 - 2.8 work with CUDA 11.2 and cudnn 8.1 , the 'latest' versions are 11.5 and 8.4 as of 05/2022.该矩阵将向您展示 tensorflow 2.5 - 2.8 与 CUDA 11.2 和 cudnn 8.1 一起使用,截至 2022 年 5 月,“最新”版本为 11.5 和 8.4。 I rolled back both versions and everything is working fine.我回滚了两个版本,一切正常。

The matrix will show you that tensorflow 2.5 - 2.8 work with CUDA 11.2 and cudnn 8.1该矩阵将向您展示 tensorflow 2.5 - 2.8 与 CUDA 11.2 和 cudnn 8.1 一起使用

I believe the problem is that CUDA 11.2 is not available for Windows 11.我认为问题在于 CUDA 11.2 不适用于 Windows 11。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM