简体   繁体   English

在内核外部使用CUDA printf来打印设备变量

[英]Using CUDA printf outside the kernel to print device variables

What is the best way to print device variables in CUDA outside of the kernel? 在内核之外的CUDA中打印设备变量的最佳方法是什么? Do I have to do a cudaMemcpy to the host and then print the resulting values? 我是否必须对主机执行cudaMemcpy ,然后打印结果值? When I try to use printf on pointers created using cudaMalloc , the program crashes. 当我尝试在使用cudaMalloc创建的指针上使用printf时,程序崩溃了。 It seems that most of the attention focuses on printing inside the kernel, not in regular code. 似乎大多数注意力都集中在内核中的打印上,而不是常规代码中。

Thanks, Eric 谢谢,埃里克

"When I try to use printf on pointers created using cudaMalloc, the program crashes" “当我尝试在使用cudaMalloc创建的指针上使用printf时,程序崩溃了”

If you have this: 如果你有这个:

int *d_data, *h_data;
cudaMalloc(&d_data, DSIZE);

You cannot do this: 你不可以做这个:

printf(" %d ", *d_data);

as this requires dereferencing a device pointer ( d_data ) in host code which is normally illegal in CUDA. 因为这需要在主机代码中取消引用设备指针( d_data ),这在CUDA中通常是非法的。

Instead you can do this: 相反,你可以这样做:

h_data = (int *)malloc(DSIZE);
cudaMemcpy(h_data, d_data, DSIZE, cudaMemcpyDeviceToHost);
printf(" %d ", *h_data);

You can also investigate Unified Memory which is new in CUDA 6, and see if it will serve your purposes. 您还可以调查CUDA 6中新增的统一内存 ,并查看它是否可以满足您的需求。

And, as mentioned in the comments, devices of cc2.0 or greater support printf from the kernel, which operates on device data (only). 并且,如评论中所述,cc2.0或更高版本的设备支持来自内核的printf ,该内核仅对设备数据进行操作(仅限)。

An approach alternative to what suggested by Robert Crovella is to wrap the device pointer into a thrust::device_ptr by thrust::device_pointer_cast . 一种替代的方法是什么建议由罗伯特·Crovella是包装设备指针变成thrust::device_ptr通过thrust::device_pointer_cast This way is slightly more immediate when you need to access only very few elements of the device array. 当您只需要访问设备阵列中很少的元素时,这种方式会更加直接。 See the example below: 请参阅以下示例:

#include <thrust\device_vector.h>

void main() {

    const int N = 10;

    int *h_data = (int*)malloc(N*sizeof(int));
    for (int i=0; i<N; i++) h_data[i] = 3;

    int *d_data; cudaMalloc(&d_data, N*sizeof(int));    

    cudaMemcpy(d_data,h_data,N*sizeof(int),cudaMemcpyHostToDevice);

    // --- Alternative approach
    thrust::device_ptr<int> dev_ptr_key     = thrust::device_pointer_cast(d_data);
    int i = 4; printf("Element number %d is equal to %d\n",i,(int)*(dev_ptr_key+i));

    getchar();

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM