[英]Tensorflow source build configuration fails: Could not find any cuda.h matching version and dictionary value error
I am trying to build Tensorflow from source as I have a 6.1 compute capability GPU, however my CPU does not support AVX commands.我正在尝试从源代码构建 Tensorflow,因为我具有 6.1 的计算能力 GPU,但是我的 CPU 不支持 AVX 命令。 My first try with docker containers failed as the Tensorflow could not be imported on tensorflow:latest-jupyter-gpu either.我第一次尝试使用 docker 容器失败了,因为 Tensorflow 也无法在 tensorflow:latest-jupyter-gpu 上导入。 I have already installed and verified CUDA drivers with manual installation of the cuda driver from nvidia website.我已经安装并验证了 CUDA 驱动程序,并从 nvidia 网站手动安装 cuda 驱动程序。 Current installation is NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0
according to nvidia-smi
output.当前安装的是NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0
根据nvidia-smi
output。 Additionally I have installed and verified installation of cudnn-8.0.1 for CUDA 11 by following this guide.此外,我已经按照本指南安装并验证了 CUDA 11 的 cudnn-8.0.1 安装。 My system runs Linux Mint 19.1.我的系统运行 Linux Mint 19.1。 I have downloaded TF source and checked out branch r2.2 to build the corresponding version.我已经下载了 TF 源码并签出了分支 r2.2 来构建相应的版本。 Despite the proposed method of installing Bazel through Bazelisk ( from this guide ) the only method which worked was by applying the command尽管提出了通过 Bazelisk 安装 Bazel 的方法(来自本指南),但唯一有效的方法是应用命令
cd "/home/user/.bazel/bin" && curl -LO https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64 && chmod +x bazel-2.0.0-linux-x86_64
However in configuration build I have to deal with two issues:但是在配置构建中我必须处理两个问题:
File "./configure.py", line 1440, in main
if validate_cuda_config(environ_cp):
File "./configure.py", line 1323, in validate_cuda_config
tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required
Consequently, I have to select at least two libraries (CUDA and TensorRT).因此,我必须 select 至少两个库(CUDA 和 TensorRT)。
Could not find any NvInferVersion.h matching version '' in any subdirectory
.在选择这两个库的情况下,脚本会继续执行,但是会出现以下消息Could not find any NvInferVersion.h matching version '' in any subdirectory
。 After providing CUDA and cudnn versions at the corresponding script prompts I have managed to proceed further by finding in my system cudnn.h
and cuda.h
paths and adding their paths in the additional scripts prompt:在相应的脚本提示符处提供 CUDA 和 cudnn 版本后,我设法通过在我的系统cudnn.h
和cuda.h
路径中找到并在附加脚本提示符中添加它们的路径来进一步进行:Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h,/usr/include/hwloc/cuda.h,/usr/local/cuda-11.0/targets/x86_64-linux/include/cudnn.h,/usr/include/cudnn.h,/usr/include/linux,/usr/local/cuda/include
However I cannot proceed any further as the script keeps failing with the message:但是,由于脚本不断失败并显示消息,我无法继续进行:
Could not find any cuda.h matching version '11' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/usr/include/hwloc/cuda.h'
'/usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h'
'/usr/local/cuda-11.0/targets/x86_64-linux/include/cudnn.h'
Asking for detailed CUDA configuration...
Any hints on how to continue?关于如何继续的任何提示? Which should be the paths that I have to provide?我必须提供哪些路径?
Thank you!谢谢!
i recently met the same problem of this:我最近遇到了同样的问题:
File "./configure.py", line 1440, in main
if validate_cuda_config(environ_cp):
File "./configure.py", line 1323, in validate_cuda_config
tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required`
I think that means my version of CUDA and Cudnn do not match the require of the needs of build tensorflow-gpu.我认为这意味着我的 CUDA 和 Cudnn 版本不符合构建 tensorflow-gpu 的需求。 Please install their right version where you can find in the 'configure.py'.请在“configure.py”中找到正确的版本。 TensorRT is not the indispensable one. TensorRT 并不是不可或缺的。
Copy this file: /tensorflow/tensorflow/core/platform/cuda.h
into one of the directories listed.将此文件: /tensorflow/tensorflow/core/platform/cuda.h
复制到列出的目录之一中。
So CUDA 11 apparently misses to refer the cudnn version in some header as stated by a previous answer, which in fact hinders the creation of a tuple.因此,如先前的答案所述,CUDA 11 显然错过了在某些 header 中引用 cudnn 版本,这实际上阻碍了元组的创建。 This happens because the second element is missing (the actual version number) and then the configuration crashes.发生这种情况是因为缺少第二个元素(实际版本号),然后配置崩溃。
So somehow you have to parse that cudnn version number to the configure file from tensorflow.因此,您必须以某种方式将该 cudnn 版本号解析为 tensorflow 中的配置文件。 What I did was to hardcode the cudnn version, just replace the '8.1' with your version.我所做的是对 cudnn 版本进行硬编码,只需将“8.1”替换为您的版本即可。
I replaced that 1323 line from the configure.py file tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
with the following code.我用以下代码替换了 configure.py 文件tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
的 1323 行。 I am sure there's a much efficient way to do write this.我相信有一个非常有效的方法来写这个。
config = {}
for line in proc.stdout:
parameter_split = line.decode('ascii').rstrip().split(': ')
if len(parameter_split) == 1:
# had to manually add here the cudnn version
config[parameter_split[0][:-1]] = '8.1'
else:
config[parameter_split[0]] = parameter_split[1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.