简体   繁体   中英

how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

I get this error with a pytorch import python -c "import torch" :

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 13, in <module>
    import torch
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

how does one fix it?

related:

I don't know why this works but this worked for me:

source cuda11.1
# To see Cuda version in use
nvcc -V
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

but if you look through the git issue these might also work:

conda install -y -c pytorch -c conda-forge cudatoolkit=11.1 pytorch torchvision torchaudio

pip3 install torch+cu111 torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html

I think the conda one looks like the most robust because you can specify exactly the cudatoolkit you need, so I'd recommend that one.

The error is from dlopen libcublas.so from .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/ , which is the pip package "nvidia-cuda-runtime" install location.

libcublasLt.so.11 is dynamically linked to libcublas.so.11 . The problem is that when you have a different cuda runtime installation (usually in /usr/local/cuda), dlopen probably gets the wrong one. You can run ldd.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/libcublas.so to check the actual path of libcublasLt.so.11 , which is supposed to be the one under .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/

Workarounds:

  1. Set env LD_LIBRARY_PATH=.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/:$LD_LIBRARY_PATH when launching python. So that dlopen can firstly look for.so files in that directory.

  2. Using older torch. It was since 1.13.0 torch pip install started using pip nvidia-* packages. Before that cuda libs are statically linked. That's why older torch pip install has no problem even if you have existing cuda install.

I wanted to work on an images detection problem using yolov7, and I installed default dependencies as provided by yolov7https://github.com/WongKinYiu/yolov7/blob/main/requirements.txt , but when I tried even to check the help manual I got this error

OSError: .../yolov7_env/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Then I tried to install some other dependencies using the following command: pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 this's how I solved the problem.

like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11,nvidia_cuda_nvrtc_cu11,nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem. In my case, I used pip uninstall nvidia_cublas_cu11 and solved the problem. I think the pytorch team should solve this, since users often have their own CUDAtoolkit installed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM