简体   繁体   中英

Tensorflow source build configuration fails: Could not find any cuda.h matching version and dictionary value error

I am trying to build Tensorflow from source as I have a 6.1 compute capability GPU, however my CPU does not support AVX commands. My first try with docker containers failed as the Tensorflow could not be imported on tensorflow:latest-jupyter-gpu either. I have already installed and verified CUDA drivers with manual installation of the cuda driver from nvidia website. Current installation is NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 according to nvidia-smi output. Additionally I have installed and verified installation of cudnn-8.0.1 for CUDA 11 by following this guide. My system runs Linux Mint 19.1. I have downloaded TF source and checked out branch r2.2 to build the corresponding version. Despite the proposed method of installing Bazel through Bazelisk ( from this guide ) the only method which worked was by applying the command

cd "/home/user/.bazel/bin" && curl -LO https://releases.bazel.build/2.0.0/release/bazel-2.0.0-linux-x86_64 && chmod +x bazel-2.0.0-linux-x86_64

However in configuration build I have to deal with two issues:

  1. When choosing about the supported libraries, if only select CUDA out of the 4 questioned libraries, I get the following error:
File "./configure.py", line 1440, in main
    if validate_cuda_config(environ_cp):
  File "./configure.py", line 1323, in validate_cuda_config
    tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required

Consequently, I have to select at least two libraries (CUDA and TensorRT).

  1. In case of selecting the two libraries then the script proceeds, however, the following message occurs Could not find any NvInferVersion.h matching version '' in any subdirectory . After providing CUDA and cudnn versions at the corresponding script prompts I have managed to proceed further by finding in my system cudnn.h and cuda.h paths and adding their paths in the additional scripts prompt:
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h,/usr/include/hwloc/cuda.h,/usr/local/cuda-11.0/targets/x86_64-linux/include/cudnn.h,/usr/include/cudnn.h,/usr/include/linux,/usr/local/cuda/include

However I cannot proceed any further as the script keeps failing with the message:

Could not find any cuda.h matching version '11' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr/include/hwloc/cuda.h'
        '/usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h'
        '/usr/local/cuda-11.0/targets/x86_64-linux/include/cudnn.h'
Asking for detailed CUDA configuration...

Any hints on how to continue? Which should be the paths that I have to provide?

Thank you!

i recently met the same problem of this:

File "./configure.py", line 1440, in main
    if validate_cuda_config(environ_cp):
  File "./configure.py", line 1323, in validate_cuda_config
    tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required`

I think that means my version of CUDA and Cudnn do not match the require of the needs of build tensorflow-gpu. Please install their right version where you can find in the 'configure.py'. TensorRT is not the indispensable one.

Copy this file: /tensorflow/tensorflow/core/platform/cuda.h into one of the directories listed.

So CUDA 11 apparently misses to refer the cudnn version in some header as stated by a previous answer, which in fact hinders the creation of a tuple. This happens because the second element is missing (the actual version number) and then the configuration crashes.

So somehow you have to parse that cudnn version number to the configure file from tensorflow. What I did was to hardcode the cudnn version, just replace the '8.1' with your version.

I replaced that 1323 line from the configure.py file tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout) with the following code. I am sure there's a much efficient way to do write this.

  config = {}
  for line in proc.stdout:

    parameter_split = line.decode('ascii').rstrip().split(': ')
    if len(parameter_split) == 1:
        # had to manually add here the cudnn version
        config[parameter_split[0][:-1]] = '8.1'
    else:
        config[parameter_split[0]] = parameter_split[1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM