[英]printf inside CUDA __global__ function
I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I can do to see what is going on inside that function. 我目前正在GPU上编写矩阵乘法并希望调试我的代码,但由于我不能在设备函数中使用printf,我还能做些什么来查看该函数内部的内容。 This my current function:
这是我目前的功能:
__global__ void MatrixMulKernel(Matrix Ad, Matrix Bd, Matrix Xd){
int tx = threadIdx.x;
int ty = threadIdx.y;
int bx = blockIdx.x;
int by = blockIdx.y;
float sum = 0;
for( int k = 0; k < Ad.width ; ++k){
float Melement = Ad.elements[ty * Ad.width + k];
float Nelement = Bd.elements[k * Bd.width + tx];
sum += Melement * Nelement;
}
Xd.elements[ty * Xd.width + tx] = sum;
}
I would love to know if Ad and Bd is what I think it is, and see if that function is actually being called. 我很想知道Ad和Bd是否是我认为的,看看是否真的被调用了。
CUDA now supports printf
s directly in the kernel. CUDA现在直接在内核中支持
printf
。 For formal description see Appendix B.16 of the CUDA C Programming Guide . 有关形式描述,请参阅“ CUDA C编程指南”的附录B.16。
EDIT 编辑
To avoid misleading people, as M. Tibbits points out printf is available in any GPU of compute capability 2.0 and higher. 为了避免误导人们,正如M. Tibbits所指出的,printf可用于任何计算能力2.0及更高版本的GPU。
END OF EDIT 编辑结束
You have choices: 你有选择:
Regarding your code snippet: 关于你的代码片段:
Matrix
structs in via pointer (ie cudaMemcpy
them to the device, then pass in the device pointer), right now you will have no problem but if the function signature gets very large then you may hit the 256 byte limit Matrix
结构(即cudaMemcpy
它们到设备,然后传入设备指针),现在你没有问题,但如果函数签名变得非常大,那么你可能会达到256字节的限制 by the way.. 顺便说说..
See "Formatted output" (currently B.17) section of CUDA C Programming Guide. 请参阅“CUDA C编程指南”的“格式化输出”(当前为B.17)部分。
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.