[英]CUDA __global__ function not called
I'm trying to compile simple helloworld example copied from here .我正在尝试编译从这里复制的简单 helloworld 示例。 I'm using CentOS 6.4 environment.
我正在使用 CentOS 6.4 环境。
// This is the REAL "hello world" for CUDA!
// It takes the string "Hello ", prints it, then passes it to CUDA with an array
// of offsets. Then the offsets are added in parallel to produce the string "World!"
// By Ingemar Ragnemalm 2010
#include <stdio.h>
const int N = 16;
const int blocksize = 16;
__global__
void hello(char *a, int *b)
{
a[threadIdx.x] += b[threadIdx.x];
}
int main()
{
char a[N] = "Hello \0\0\0\0\0\0";
int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
char *ad;
int *bd;
const int csize = N*sizeof(char);
const int isize = N*sizeof(int);
printf("%s", a);
cudaMalloc( (void**)&ad, csize );
cudaMalloc( (void**)&bd, isize );
cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );
dim3 dimBlock( blocksize, 1 );
dim3 dimGrid( 1, 1 );
hello<<<dimGrid, dimBlock>>>(ad, bd);
cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
cudaFree( ad );
cudaFree( bd );
printf("%s\n", a);
return EXIT_SUCCESS;
}
Trying to compile it works fine:尝试编译它工作正常:
$ nvcc hello_world.cu -o hello_world.bin
But when I run it:但是当我运行它时:
$ ./hello_world.bin
Hello Hello
It doesn't print the expected 'Hello World', but instead 'Hello Hello'.它不会打印预期的“Hello World”,而是“Hello Hello”。 If I comment some code out from the
__global__
function there is no impact at all, or even adding printf into the hello() function does not result in anything.如果我从
__global__
函数中注释掉一些代码,则根本没有影响,甚至将 printf 添加到 hello() 函数中也不会产生任何结果。 It seems the function isn't called.似乎没有调用该函数。 What am I missing?
我错过了什么? What can I check?
我可以检查什么?
I have also tried some other example source codes, which work on another box.我还尝试了一些其他示例源代码,它们在另一个盒子上工作。 The problem seems to be the same, so something isn't right on this computer.
问题似乎是一样的,所以这台计算机上出现了问题。
Edit:编辑:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Wed_Jul_17_18:36:13_PDT_2013
Cuda compilation tools, release 5.5, V5.5.0
$ nvidia-smi -a
-bash: nvidia-smi: command not found
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 319.60 Wed Sep 25 14:28:26 PDT 2013
GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
$ dmesg | grep NVRM
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.60 Wed Sep 25 14:28:26 PDT 2013
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.60 Wed Sep 25 14:28:26 PDT 2013
Thanks to advise from @RobertCrovella, I added return value checks all over my code:感谢@RobertCrovella 的建议,我在我的代码中添加了返回值检查:
#include <stdio.h>
const int N = 16;
const int blocksize = 16;
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
__global__
void hello(char *a, int *b)
{
a[threadIdx.x] += b[threadIdx.x];
}
int main()
{
char a[N] = "Hello \0\0\0\0\0\0";
int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
char *ad;
int *bd;
const int csize = N*sizeof(char);
const int isize = N*sizeof(int);
printf("%s", a);
gpuErrchk(cudaMalloc( (void**)&ad, csize ));
gpuErrchk(cudaMalloc( (void**)&bd, isize ));
gpuErrchk(cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ));
gpuErrchk(cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ));
dim3 dimBlock( blocksize, 1 );
dim3 dimGrid( 1, 1 );
hello<<<dimGrid, dimBlock>>>(ad, bd);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );
gpuErrchk(cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ));
gpuErrchk(cudaFree( ad ));
gpuErrchk(cudaFree( bd ));
printf("%s\n", a);
return EXIT_SUCCESS;
}
This lead to discovery of this error when running the code:这导致在运行代码时发现此错误:
$ nvcc hello_world.cu -o hello_world.bin
$ ./hello_world.bin
GPUassert: CUDA driver version is insufficient for CUDA runtime version hello_world.cu 39
I was running this on a cloud provider which did the setup of CUDA environment, so I suspected something was wrong in env I had done after that.我在一个设置 CUDA 环境的云提供商上运行它,所以我怀疑我在那之后所做的 env 有问题。 In my environment, cuda env is set up by using
在我的环境中,cuda env 是通过使用设置的
module load cuda55/toolkit/5.5.22
which should set up the environment fully.这应该完全设置环境。 This was something I did not know at first, so before using that, I had tried to set up some paths myself.
这是我一开始不知道的东西,所以在使用它之前,我试图自己设置一些路径。 Due to that this was in my .bash_profile:
因此,这是在我的 .bash_profile 中:
export CUDA_INSTALL_PATH=/cm/shared/apps/cuda55/toolkit/current
export PATH=$PATH:$CUDA_INSTALL_PATH/bin
export LD_LIBRARY_PATH=$CUDA_INSTALL_PATH/lib64
export PATH=$PATH:$CUDA_INSTALL_PATH/lib
Once I removed the stuff I had added to my .bash_profile and did a logout/login, everything started to work without issues.一旦我删除了我添加到我的 .bash_profile 的东西并进行了注销/登录,一切都开始正常工作了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.