printing from cuda kernels

Question

I am writing a cuda program and trying to print something inside the cuda kernels using the printf function. But when I am compiling the program then I am getting an error

error : calling a host function("printf") from a __device__/__global__ function("agent_movement_top") is not allowed


 error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include"  -G  --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile  -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MDd  " -o "Debug\test.cu.obj" "C:\Users\umdutta\Desktop\SANKHA_ALL_MATERIALS\PROGRAMMING_FOLDER\ABM_MODELLING_2D_3D\TRY_NUM_2\test_proj_test\test_proj\test_proj\test.cu"" exited with code 2.

I am using the card GTX 560 ti having a compute capability greater than 2.0 and when I have searched a bit about the printing from cuda kernels I also saw that I need to change the compiler from sm_10 to sm_2.0 to take the full advantage of the card. Also some suggested for cuPrintf to serve the purpose. I am bit confused what should I do and what should be the simplest and quickest way to get the printouts on my console screen. If I need to change the nvcc compiler from 1.0 to 2.0 then what should I do? One more thing I would like to mention that I am using windows 7.0 and programming in visual studio 2010. Thanks for all your help.

Answer 1

To enable use of plain printf() on devices of Compute Capability >= 2.0, it's important to compile for CC of at least CC 2.0 and disable the default, which includes a build for CC 1.0.

Right-click the .cu file in your project, select Properties , select Configuration Properties | CUDA C/C++ | Device . Click on the Code Generation line, click the triangle, select Edit . In the Code Generation dialog box, uncheck Inherit from parent or project defaults , type compute_20,sm_20 in the top window, click OK.

Answer 2

you can write this code to print whatever you want from inside the CUDA Kernel:

# if __CUDA_ARCH__>=200
    printf("%d \n", tid);

#endif

and include < stdio.h >

Answer 3

One way of solving this problem is by using cuPrintf function which is capable of printing from the kernels. Copy the files cuPrintf.cu and cuPrintf.cuh from the folder

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\src\simplePrintf

to the project folder. Then add the header file cuPrintf.cuh to your project and add

#include "cuPrintf.cu"

to your code. Then your code should be written in a format mentioned below :

#include "cuPrintf.cu"
__global__ void testKernel(int val)
{
  cuPrintf("Value is: %d\n", val);
}

int main()
{
  cudaPrintfInit();
  testKernel<<< 2, 3 >>>(10);
  cudaPrintfDisplay(stdout, true);
  cudaPrintfEnd();
  return 0;
}

By following the above procedure one can get a print on the console window from the device function. Though I solved my issues in the above mentioned way I still don't have the solution of using printf from the device function. If it is true and absolutely necessary to upgrade my nvcc compiler from sm_10 to sm_21 to enable the printf feature then it would be very much helpful if someone could show me the light. Thanks for all your cooperation

Answer 4

I'm have cuda v10.0.130 on Visual Studio 2015 with a GeForce GTX 1060, and all I had to do was add the following include statement:

#include <helper_cuda.h>

Then I was able to use the printf statement without any issues.

Answer 5

I am using GTX 1650 also GTX1050, and c++11. For recent users, this is my suggestion:

In host function:

#include<iostream>
using namespace std;

cout<< .....(anything you want) << endl;

In kernel:

if(threadIdx.x==0){
    printf("ss=%4.2f \n", ss);
}

Note that this "if" is quite important and I notice nobody mentioned this. Because you might use a lot of threads and you definitely do not want to print too much from every threads. Also 4.2f means 4 points and 2 for decimal. This can prevent print too much 00000. Also do not forget \n to jump line.

Also you can consider this to print shared memory value:

if(threadIdx.x==0){
    for(int i=0;i<64;i++){
        for(int j=0;j<8; j++){
            printf("%4.2f  ", ashare[i*8+j]);
        }
        printf("\n");
    }
    printf("\n");
}

This can print shared memory beautifully. Notice also need to restrict only in threadIdx.x==0

printing from cuda kernels

Question

4 answers

solution1
9 ACCPTED 2013-01-01 01:16:12

solution2
9 2013-05-15 23:24:49

solution3
7 2012-12-31 23:40:45

solution4
0 2019-01-08 14:24:26

solution5
0 2022-06-10 11:34:14

printing from cuda kernels

Question

4 answers

solution1 9 ACCPTED 2013-01-01 01:16:12

solution2 9 2013-05-15 23:24:49

solution3 7 2012-12-31 23:40:45

solution4 0 2019-01-08 14:24:26

solution5 0 2022-06-10 11:34:14

solution1
9 ACCPTED 2013-01-01 01:16:12

solution2
9 2013-05-15 23:24:49

solution3
7 2012-12-31 23:40:45

solution4
0 2019-01-08 14:24:26

solution5
0 2022-06-10 11:34:14