简体   繁体   中英

"unknown error" on first cudaMalloc if CUBLAS is present in kernel

I have the following minimal .cu file

#include <cuda_runtime_api.h>
#include <cublas_v2.h>
#include <cstdio>

__global__ void test()
{
    cublasHandle_t handle = nullptr;
    cublasCreate(&handle);
}

int main(int, char**)
{
    void * data = nullptr;
    auto err = cudaMalloc(&data, 256);
    printf("%s\n", cudaGetErrorString(err));
    return 0;
}

As you can see, the test kernel isn't even being called, however cudaMalloc returns 30 (unknown error). The file is being compile with separable compilation (required for dynamic parallelism) and compute capability 5.2 (also tried 3.5 and 5.0, which didn't change anything). Removing the call to cublasCreate causes cudaMalloc to return 0 (no error).

What could be the cause? And how can I fix it? I need to call CUBLAS from a kernel using dynamic parallelism which is theoretically supported , so "just remove the call" is not an option.

Here is the corresponding CMakeLists.txt :

cmake_minimum_required(VERSION 3.3 FATAL_ERROR)
project(CublasError)

find_package(CUDA REQUIRED)

set(CUDA_SEPARABLE_COMPILATION ON)
set(CUDA_NVCC_FLAGS --gpu-architecture=compute_52 -Xptxas=-v)
list(APPEND CUDA_NVCC_FLAGS_DEBUG -G -keep -O0)

cuda_add_executable(${PROJECT_NAME} main.cu)
cuda_add_cublas_to_target(${PROJECT_NAME})

# FindCUDA.cmake does not automatically add (or find) cudadevrt which is required when separable compilation is on
if(CUDA_SEPARABLE_COMPILATION)
    get_filename_component(CUDA_LIB_PATH ${CUDA_CUDART_LIBRARY} DIRECTORY)
    find_library(CUDA_cudadevrt_LIBRARY cudadevrt PATHS ${CUDA_LIB_PATH})
    target_link_libraries(${PROJECT_NAME} ${CUDA_cudadevrt_LIBRARY})
endif()

Here is a set of theoretically similar compile commands (the result is at least the same):

nvcc -dc --gpu-architecture=compute_52 -m64 main.cu -o main.dc.obj
nvcc -dlink --gpu-architecture=compute_52 -m64 main.dc.obj -o main.obj
link /SUBSYSTEM:CONSOLE /LIBPATH:"%CUDA_PATH%\lib\x64" main.obj main.dc.obj cudart_static.lib cudadevrt.lib cublas.lib cublas_device.lib

It turns out that nvcc -dlink does not report missing dependencies and just happily continues without emitting any errors. The solution to the problem is that cublas_device.lib must be linked both during host linking and device linking, ie the compile commands should look as follows:

nvcc -dc --gpu-architecture=compute_52 -m64 main.cu -o main.dc.obj
nvcc -dlink --gpu-architecture=compute_52 -m64 -lcublas_device main.dc.obj -o main.obj
link /SUBSYSTEM:CONSOLE /LIBPATH:"%CUDA_PATH%\lib\x64" main.obj main.dc.obj cudart_static.lib cudadevrt.lib cublas.lib cublas_device.lib

Also, nvcc -dlink is order dependent, but in the opposite manner that one is used to from ld : -lcublas_device must appear before the object files that require it.

On the CMake side of things, cuda_add_cublas_to_target fails to add cublas_device.lib to the device link command and only adds it to the host link command. As a workaround, add the dependency explicitly to the list of nvcc flags:

list(APPEND CUDA_NVCC_FLAGS -lcublas_device)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM