未调用 CUDA global 函数

Question

我正在尝试编译从这里复制的简单 helloworld 示例。 我正在使用 CentOS 6.4 环境。

// This is the REAL "hello world" for CUDA!
// It takes the string "Hello ", prints it, then passes it to CUDA with an array
// of offsets. Then the offsets are added in parallel to produce the string "World!"
// By Ingemar Ragnemalm 2010

#include <stdio.h>

const int N = 16; 
const int blocksize = 16; 

__global__ 
void hello(char *a, int *b) 
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
    char a[N] = "Hello \0\0\0\0\0\0";
    int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    char *ad;
    int *bd;
    const int csize = N*sizeof(char);
    const int isize = N*sizeof(int);

    printf("%s", a);

    cudaMalloc( (void**)&ad, csize ); 
    cudaMalloc( (void**)&bd, isize ); 
    cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); 
    cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); 

    dim3 dimBlock( blocksize, 1 );
    dim3 dimGrid( 1, 1 );
    hello<<<dimGrid, dimBlock>>>(ad, bd);
    cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); 
    cudaFree( ad );
    cudaFree( bd );

    printf("%s\n", a);
    return EXIT_SUCCESS;
}

尝试编译它工作正常：

$ nvcc hello_world.cu -o hello_world.bin

但是当我运行它时：

$ ./hello_world.bin
Hello Hello

它不会打印预期的“Hello World”，而是“Hello Hello”。 如果我从__global__函数中注释掉一些代码，则根本没有影响，甚至将 printf 添加到 hello() 函数中也不会产生任何结果。 似乎没有调用该函数。 我错过了什么？ 我可以检查什么？

我还尝试了一些其他示例源代码，它们在另一个盒子上工作。 问题似乎是一样的，所以这台计算机上出现了问题。

编辑：

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Wed_Jul_17_18:36:13_PDT_2013
Cuda compilation tools, release 5.5, V5.5.0
$ nvidia-smi -a
-bash: nvidia-smi: command not found
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  319.60  Wed Sep 25 14:28:26 PDT 2013
GCC version:  gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
$ dmesg | grep NVRM
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  319.60  Wed Sep 25 14:28:26 PDT 2013
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  319.60  Wed Sep 25 14:28:26 PDT 2013

Answer 1

感谢@RobertCrovella 的建议，我在我的代码中添加了返回值检查：

#include <stdio.h>

const int N = 16;
const int blocksize = 16;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}
__global__
void hello(char *a, int *b)
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
        char a[N] = "Hello \0\0\0\0\0\0";
        int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

        char *ad;
        int *bd;
        const int csize = N*sizeof(char);
        const int isize = N*sizeof(int);

        printf("%s", a);

        gpuErrchk(cudaMalloc( (void**)&ad, csize ));
        gpuErrchk(cudaMalloc( (void**)&bd, isize ));
        gpuErrchk(cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ));
        gpuErrchk(cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ));

        dim3 dimBlock( blocksize, 1 );
        dim3 dimGrid( 1, 1 );
        hello<<<dimGrid, dimBlock>>>(ad, bd);
        gpuErrchk( cudaPeekAtLastError() );
        gpuErrchk( cudaDeviceSynchronize() );
        gpuErrchk(cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ));
        gpuErrchk(cudaFree( ad ));
        gpuErrchk(cudaFree( bd ));

        printf("%s\n", a);
        return EXIT_SUCCESS;
}

这导致在运行代码时发现此错误：

$ nvcc hello_world.cu -o hello_world.bin
$ ./hello_world.bin
GPUassert: CUDA driver version is insufficient for CUDA runtime version hello_world.cu 39

我在一个设置 CUDA 环境的云提供商上运行它，所以我怀疑我在那之后所做的 env 有问题。 在我的环境中，cuda env 是通过使用设置的

module load cuda55/toolkit/5.5.22

这应该完全设置环境。 这是我一开始不知道的东西，所以在使用它之前，我试图自己设置一些路径。 因此，这是在我的 .bash_profile 中：

export CUDA_INSTALL_PATH=/cm/shared/apps/cuda55/toolkit/current
export PATH=$PATH:$CUDA_INSTALL_PATH/bin
export LD_LIBRARY_PATH=$CUDA_INSTALL_PATH/lib64
export PATH=$PATH:$CUDA_INSTALL_PATH/lib

一旦我删除了我添加到我的 .bash_profile 的东西并进行了注销/登录，一切都开始正常工作了。

未调用 CUDA global 函数

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-02-24 15:42:10

未调用 CUDA __global__ 函数

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-02-24 15:42:10

未调用 CUDA global 函数

解决方案1
2 已采纳 2014-02-24 15:42:10