Cuda Hello World printf 即使使用 -arch=sm_20 也无法正常工作

Question

I didn't think I was a complete newbie with Cuda, but apparently I am.我不认为我是 Cuda 的完整新手，但显然我是。

I recently upgraded my cuda device to one capable capability 1.3 to 2.1 (Geforce GT 630).我最近将我的 cuda 设备升级到了功能 1.3 到 2.1 (Geforce GT 630)。 I thought to do a full upgrade to Cuda toolkit 5.0 as well.我还想对 Cuda 工具包 5.0 进行全面升级。

I can compile general cuda kernels, but printf is not working even with -arch=sm_20 set.我可以编译一般的 cuda 内核，但 printf 即使设置了 -arch=sm_20 也无法工作。

Code:代码：

#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test(){

    printf("Hi Cuda World");
}

int main( int argc, char** argv )
{

    test<<<1,1>>>();
        return 0;
}

Compiler:编译器：

Error   2   error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_20,compute_10\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include"  -G   --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile -arch=sm_20  -g   -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o "Debug\main.cu.obj" "d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu"" exited with code 2.  C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.0.targets  592 10  testCuda
Error   1   error : calling a __host__ function("printf") from a __global__ function("test") is not allowed d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu    9   1   testCuda

I'm about done with life because of this problem...done done done.由于这个问题，我的生活即将结束……完成了。 Please talk me down from the rooftops with an answer.请从屋顶上告诉我答案。

Answer 1

If you're using printf in kernel, you should use cudaDeviceSynchronize() :如果您在内核中使用printf ，则应使用cudaDeviceSynchronize() ：

#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test(){
    printf("Hi Cuda World");
}

int main( int argc, char** argv )
{
    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

Answer 2

In kernel printf is only supported in compute capability 2 or higher hardware.在内核中 printf 仅在计算能力 2 或更高的硬件中受支持。 Because your project is set to build for both compute capability 1.0 and compute 2.1, nvcc compiles the code multiple times and builds a multi-architecture fatbinary object.由于您的项目设置为同时针对计算能力 1.0 和计算能力 2.1 进行构建，因此 nvcc 会多次编译代码并构建多架构 fatbinary 对象。 It is during the compute capability 1.0 compilation cycle that the error is being generated, because the printf call is unsupported for that architecture .错误是在计算能力 1.0 编译周期期间生成的，因为该架构不支持printf调用。

If you remove the compute capability 1.0 build target from your project, the error will disappear.如果您从项目中删除计算能力 1.0 构建目标，错误将消失。

You could alternatively, write the kernel like this:您也可以像这样编写内核：

__global__ void test()
{
#if __CUDA_ARCH__ >= 200
    printf("Hi Cuda World");
#endif
}

The __CUDA_ARCH__ symbol will only be >= 200 when building for compute capability 2.0 or high targets and this would allow you to compile this code for compute capability 1.x devices without encountering a syntax error. __CUDA_ARCH__符号在为计算能力 2.0 或高目标构建时只会 >= 200，这将允许您为计算能力 1.x 设备编译此代码而不会遇到语法错误。

When compiling for the correct architecture and getting no output, you also need to ensure that the kernel finishes and the driver flushes the output buffer.当为正确的架构编译并且没有输出时，您还需要确保内核完成并且驱动程序刷新输出缓冲区。 To do this add a synchronizing call after the kernel launch in the host code为此，请在主机代码中的内核启动后添加同步调用

for example:例如：

int main( int argc, char** argv )
{

    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

[disclaimer: all code written in browser, never compiled, use at own risk] [免责声明：所有代码在浏览器中编写，从未编译，使用风险自负]

If you do both things, you should be able to compile, run and see output.如果你同时做这两件事，你应该能够编译、运行并查看输出。

Answer 3

Just use cudaDeviceSynchronize() .只需使用cudaDeviceSynchronize() 。 As a supplement to @Tomasz's answer.作为@Tomasz 回答的补充。

Devices with compute capability 2.x or higher support calls to printf from within a CUDA kernel.具有计算能力 2.x 或更高版本的设备支持从 CUDA 内核中调用 printf。

printf output is stored in a circular buffer of a fixed size . printf输出存储在固定大小的循环缓冲区中。 And this buffer is flushed only for:并且此缓冲区仅针对以下情况进行刷新：

the start of a kernel launch内核启动的开始
synchronization (eg cudaDeviceSynchronize())同步（例如 cudaDeviceSynchronize()）
blocking memory copies (eg cudaMemcpy(...))阻塞内存副本（例如 cudaMemcpy(...)）
module load/unload模块加载/卸载
context destruction上下文破坏

So the most simple "Hello world" example:所以最简单的“Hello world”示例：

#include <stdio.h>

__global__ void hello() {
    printf("Hello from GPU);
}

int main() {
    hello<<<1, 1>>>();
    cudaDeviceSynchronize();
}

Reference:参考：

cmu15418 cmu15418
Nvidia CUDA Toolkit Document Nvidia CUDA 工具包文档

Cuda Hello World printf 即使使用 -arch=sm_20 也无法正常工作

问题描述

3 个解决方案

解决方案1
28 2013-03-28 10:40:32

解决方案2
11 已采纳 2013-03-28 06:45:10

解决方案3
2 2020-04-10 19:18:25

Cuda Hello World printf 即使使用 -arch=sm_20 也无法正常工作

问题描述

3 个解决方案

解决方案1 28 2013-03-28 10:40:32

解决方案2 11 已采纳 2013-03-28 06:45:10

解决方案3 2 2020-04-10 19:18:25

解决方案1
28 2013-03-28 10:40:32

解决方案2
11 已采纳 2013-03-28 06:45:10

解决方案3
2 2020-04-10 19:18:25