简体   繁体   中英

printing from cuda kernels

I am writing a cuda program and trying to print something inside the cuda kernels using the printf function. But when I am compiling the program then I am getting an error

error : calling a host function("printf") from a __device__/__global__ function("agent_movement_top") is not allowed


 error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\common\inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include"  -G  --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile  -g    -Xcompiler "/EHsc /nologo /Od /Zi  /MDd  " -o "Debug\test.cu.obj" "C:\Users\umdutta\Desktop\SANKHA_ALL_MATERIALS\PROGRAMMING_FOLDER\ABM_MODELLING_2D_3D\TRY_NUM_2\test_proj_test\test_proj\test_proj\test.cu"" exited with code 2.

I am using the card GTX 560 ti having a compute capability greater than 2.0 and when I have searched a bit about the printing from cuda kernels I also saw that I need to change the compiler from sm_10 to sm_2.0 to take the full advantage of the card. Also some suggested for cuPrintf to serve the purpose. I am bit confused what should I do and what should be the simplest and quickest way to get the printouts on my console screen. If I need to change the nvcc compiler from 1.0 to 2.0 then what should I do? One more thing I would like to mention that I am using windows 7.0 and programming in visual studio 2010. Thanks for all your help.

To enable use of plain printf() on devices of Compute Capability >= 2.0, it's important to compile for CC of at least CC 2.0 and disable the default, which includes a build for CC 1.0.

Right-click the .cu file in your project, select Properties , select Configuration Properties | CUDA C/C++ | Device . Click on the Code Generation line, click the triangle, select Edit . In the Code Generation dialog box, uncheck Inherit from parent or project defaults , type compute_20,sm_20 in the top window, click OK.

you can write this code to print whatever you want from inside the CUDA Kernel:

# if __CUDA_ARCH__>=200
    printf("%d \n", tid);

#endif  

and include < stdio.h >

One way of solving this problem is by using cuPrintf function which is capable of printing from the kernels. Copy the files cuPrintf.cu and cuPrintf.cuh from the folder

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\src\simplePrintf

to the project folder. Then add the header file cuPrintf.cuh to your project and add

#include "cuPrintf.cu"

to your code. Then your code should be written in a format mentioned below :

#include "cuPrintf.cu"
__global__ void testKernel(int val)
{
  cuPrintf("Value is: %d\n", val);
}

int main()
{
  cudaPrintfInit();
  testKernel<<< 2, 3 >>>(10);
  cudaPrintfDisplay(stdout, true);
  cudaPrintfEnd();
  return 0;
}

By following the above procedure one can get a print on the console window from the device function. Though I solved my issues in the above mentioned way I still don't have the solution of using printf from the device function. If it is true and absolutely necessary to upgrade my nvcc compiler from sm_10 to sm_21 to enable the printf feature then it would be very much helpful if someone could show me the light. Thanks for all your cooperation

I'm have cuda v10.0.130 on Visual Studio 2015 with a GeForce GTX 1060, and all I had to do was add the following include statement:

#include <helper_cuda.h>

Then I was able to use the printf statement without any issues.

I am using GTX 1650 also GTX1050, and c++11. For recent users, this is my suggestion:

In host function:

#include<iostream>
using namespace std;

cout<< .....(anything you want) << endl;

In kernel:

if(threadIdx.x==0){
    printf("ss=%4.2f \n", ss);
}

Note that this "if" is quite important and I notice nobody mentioned this. Because you might use a lot of threads and you definitely do not want to print too much from every threads. Also 4.2f means 4 points and 2 for decimal. This can prevent print too much 00000. Also do not forget \n to jump line.

Also you can consider this to print shared memory value:

if(threadIdx.x==0){
    for(int i=0;i<64;i++){
        for(int j=0;j<8; j++){
            printf("%4.2f  ", ashare[i*8+j]);
        }
        printf("\n");
    }
    printf("\n");
}

This can print shared memory beautifully. Notice also need to restrict only in threadIdx.x==0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM