简体   繁体   English

Tensorflow 源构建配置失败:找不到任何 cuda.h 匹配版本和字典值错误

[英]Tensorflow source build configuration fails: Could not find any cuda.h matching version and dictionary value error

I am trying to build Tensorflow from source as I have a 6.1 compute capability GPU, however my CPU does not support AVX commands.我正在尝试从源代码构建 Tensorflow,因为我具有 6.1 的计算能力 GPU,但是我的 CPU 不支持 AVX 命令。 My first try with docker containers failed as the Tensorflow could not be imported on tensorflow:latest-jupyter-gpu either.我第一次尝试使用 docker 容器失败了,因为 Tensorflow 也无法在 tensorflow:latest-jupyter-gpu 上导入。 I have already installed and verified CUDA drivers with manual installation of the cuda driver from nvidia website.我已经安装并验证了 CUDA 驱动程序,并从 nvidia 网站手动安装 cuda 驱动程序。 Current installation is NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 according to nvidia-smi output.当前安装的是NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0根据nvidia-smi output。 Additionally I have installed and verified installation of cudnn-8.0.1 for CUDA 11 by following this guide.此外,我已经按照指南安装并验证了 CUDA 11 的 cudnn-8.0.1 安装。 My system runs Linux Mint 19.1.我的系统运行 Linux Mint 19.1。 I have downloaded TF source and checked out branch r2.2 to build the corresponding version.我已经下载了 TF 源码并签出了分支 r2.2 来构建相应的版本。 Despite the proposed method of installing Bazel through Bazelisk ( from this guide ) the only method which worked was by applying the command尽管提出了通过 Bazelisk 安装 Bazel 的方法(来自本指南),但唯一有效的方法是应用命令

cd "/home/user/.bazel/bin" && curl -LO https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64 && chmod +x bazel-2.0.0-linux-x86_64

However in configuration build I have to deal with two issues:但是在配置构建中我必须处理两个问题:

  1. When choosing about the supported libraries, if only select CUDA out of the 4 questioned libraries, I get the following error:在选择支持的库时,如果只有 select CUDA 在 4 个受质疑的库中,我会收到以下错误:
File "./configure.py", line 1440, in main
    if validate_cuda_config(environ_cp):
  File "./configure.py", line 1323, in validate_cuda_config
    tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required

Consequently, I have to select at least two libraries (CUDA and TensorRT).因此,我必须 select 至少两个库(CUDA 和 TensorRT)。

  1. In case of selecting the two libraries then the script proceeds, however, the following message occurs Could not find any NvInferVersion.h matching version '' in any subdirectory .在选择这两个库的情况下,脚本会继续执行,但是会出现以下消息Could not find any NvInferVersion.h matching version '' in any subdirectory After providing CUDA and cudnn versions at the corresponding script prompts I have managed to proceed further by finding in my system cudnn.h and cuda.h paths and adding their paths in the additional scripts prompt:在相应的脚本提示符处提供 CUDA 和 cudnn 版本后,我设法通过在我的系统cudnn.hcuda.h路径中找到并在附加脚本提示符中添加它们的路径来进一步进行:
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h,/usr/include/hwloc/cuda.h,/usr/local/cuda-11.0/targets/x86_64-linux/include/cudnn.h,/usr/include/cudnn.h,/usr/include/linux,/usr/local/cuda/include

However I cannot proceed any further as the script keeps failing with the message:但是,由于脚本不断失败并显示消息,我无法继续进行:

Could not find any cuda.h matching version '11' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr/include/hwloc/cuda.h'
        '/usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h'
        '/usr/local/cuda-11.0/targets/x86_64-linux/include/cudnn.h'
Asking for detailed CUDA configuration...

Any hints on how to continue?关于如何继续的任何提示? Which should be the paths that I have to provide?我必须提供哪些路径?

Thank you!谢谢!

i recently met the same problem of this:我最近遇到了同样的问题:

File "./configure.py", line 1440, in main
    if validate_cuda_config(environ_cp):
  File "./configure.py", line 1323, in validate_cuda_config
    tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required`

I think that means my version of CUDA and Cudnn do not match the require of the needs of build tensorflow-gpu.我认为这意味着我的 CUDA 和 Cudnn 版本不符合构建 tensorflow-gpu 的需求。 Please install their right version where you can find in the 'configure.py'.请在“configure.py”中找到正确的版本。 TensorRT is not the indispensable one. TensorRT 并不是不可或缺的。

Copy this file: /tensorflow/tensorflow/core/platform/cuda.h into one of the directories listed.将此文件: /tensorflow/tensorflow/core/platform/cuda.h复制到列出的目录之一中。

So CUDA 11 apparently misses to refer the cudnn version in some header as stated by a previous answer, which in fact hinders the creation of a tuple.因此,如先前的答案所述,CUDA 11 显然错过了在某些 header 中引用 cudnn 版本,这实际上阻碍了元组的创建。 This happens because the second element is missing (the actual version number) and then the configuration crashes.发生这种情况是因为缺少第二个元素(实际版本号),然后配置崩溃。

So somehow you have to parse that cudnn version number to the configure file from tensorflow.因此,您必须以某种方式将该 cudnn 版本号解析为 tensorflow 中的配置文件。 What I did was to hardcode the cudnn version, just replace the '8.1' with your version.我所做的是对 cudnn 版本进行硬编码,只需将“8.1”替换为您的版本即可。

I replaced that 1323 line from the configure.py file tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout) with the following code.我用以下代码替换了 configure.py 文件tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)的 1323 行。 I am sure there's a much efficient way to do write this.我相信有一个非常有效的方法来写这个。

  config = {}
  for line in proc.stdout:

    parameter_split = line.decode('ascii').rstrip().split(': ')
    if len(parameter_split) == 1:
        # had to manually add here the cudnn version
        config[parameter_split[0][:-1]] = '8.1'
    else:
        config[parameter_split[0]] = parameter_split[1]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 致命错误:cuda.h:没有那个文件或目录 - Fatal error: cuda.h: No such file or directory Tensorflow 从源错误 ValueError 构建:在 cuda 路径配置期间 int() 以 10 为基数的无效文字:''? - Tensorflow build from source error ValueError: invalid literal for int() with base 10: '' during cuda path configuration? 错误:找不到满足 tensorflow==2.1.0 要求的版本,并且未找到 tensorflow==2.1.0 的匹配分布 - ERROR: Could not find a version that satisfies the requirement tensorflow==2.1.0 and No matching distribution found for tensorflow==2.1.0 找不到满足要求 tensorflow==1.15.3 的版本(来自 ludwig) - Could not find a version that satisfies the requirement tensorflow==1.15.3 (from ludwig) 配置:错误:找不到库的版本 - configure: error: Could not find a version of the library Maven构建失败并出现错误:无法在指定路径找到工件 - 路径稍微不正确 - Maven build fails with error: Could not find artifact … at specified path - where the path is slightly incorrect 尝试从源代码构建 TensorFlow 时:不一致的 CUDA 工具包路径:/usr vs /usr/lib - When trying to build TensorFlow from source: Inconsistent CUDA toolkit path: /usr vs /usr/lib 运行`helm version`时`错误:找不到分蘖` - `Error: could not find tiller` when running `helm version` docker 错误:找不到满足要求 apturl==0.5.2 的版本 - docker ERROR: Could not find a version that satisfies the requirement apturl==0.5.2 如何修复 CMake 中的“找不到 package 配置文件……”错误? - How to fix “Could not find a package configuration file …” error in CMake?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM