简体   繁体   English

在CUDA中传递内核参数?

[英]Kernel parameter passing in CUDA?

I have a newbie doubt regarding how do CUDA kernels work. 关于CUDA内核如何工作,我有一个新手怀疑。

If have the following code (which use the function cuPrintf taken from here ): 如果有以下代码(使用从此处获取的函数cuPrintf ):

#include "cuPrintf.cu"

__global__ void testKernel(int param){
    cuPrintf("Param value: %d\n", param);
}

int main(void){

    // initialize cuPrintf
    cudaPrintfInit();

    int a = 456;    

    testKernel<<<4,1>>>(a);

    // display the device's greeting
    cudaPrintfDisplay();

    // clean up after cuPrintf
    cudaPrintfEnd();
}

The output of the execution is: 执行的输出是:

Param value: 456
Param value: 456
Param value: 456
Param value: 456

I cannot get how the kernel can read the correct value of the parameter I pass, isn't it allocated in the host memory? 我无法得到内核如何读取我传递的参数的正确值,是不是在主机内存中分配了? Can the GPU read from the host memory? GPU可以从主机内存中读取吗?

Thanks, 谢谢,

Andrea 安德里亚

According to the section E.2.5.2. 根据E.2.5.2节。 Function Parameters in CUDA C Programming Guide CUDA C编程指南”中的函数参数

__global__ function parameters are passed to the device: __global__函数参数传递给设备:

  • via shared memory and are limited to 256 bytes on devices of compute capability 1.x, 通过共享内存,在计算能力1.x的设备上限制为256字节,
  • via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher. 通过恒定内存,在计算能力2.x及更高的设备上限制为4 KB。

The declaration void testKernel(int param) says that param is passed by value, not by reference. 声明void testKernel(int param)表示param是通过值传递的,而不是通过引用传递的。 In other words, the stack contains a copy of a 's value, not a pointer to a . 换句话说,该堆栈包含的拷贝a的值,而不是一个指向a CUDA copies the stack to the kernel running on the GPU. CUDA将堆栈复制到GPU上运行的内核。

According to the CUDA Programming Guide (Appendix B.16) the arguments are passed via shared memory to the device. 根据CUDA编程指南(附录B.16),参数通过共享内存传递给设备。

The arguments to the execution configuration are evaluated before the actual function arguments and like the function arguments , are currently passed via shared memory to the device. 在实际函数参数之前评估执行配置的参数,并且像函数参数一样 ,当前通过共享内存传递给设备。

In the runtime API, parameters for global functions are implicitly marshalled and copied from the host to the device. 在运行时API中, 全局函数的参数被隐式编组并从主机复制到设备。

NVCC compiler generates code that hides the marshalling from you. NVCC编译器生成隐藏编组的代码。 You can find the Parameter sizes and limitations in the CUDA Programming Guide 您可以在CUDA编程指南中找到参数大小和限制

The parameters are passed to the kernels when you invoke them; 调用它们时,参数将传递给内核; otherwise how else would you communicate with the GPU? 否则你怎么会与GPU沟通? It is the same as the idea behind setting a uniform in a shader. 它与在着色器中设置制服背后的想法相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM