简体   繁体   English

如何将多个重复的参数传递给 CUDA 内核

[英]How to pass multiple duplicated arguments to CUDA Kernel

I'm looking for an elegant way to pass multiple duplicated arguments in CUDA kernel,我正在寻找一种优雅的方式在 CUDA 内核中传递多个重复的参数,

As we all know, each kernel argument is located on the stack of each CUDA thread, therefore, there might be duplication between arguments being passed by the Kernel to each thread, memory which is located on each stack.众所周知,每个内核参数都位于每个 CUDA 线程的堆栈上,因此,内核传递给每个线程的参数之间可能存在重复,内存位于每个堆栈上。

In order to minimize the number of duplicated arguments being passed, I'm looking for an elegant way doing so.为了尽量减少传递的重复参数的数量,我正在寻找一种优雅的方法。

In order to explain my concern: Let's say my code looks like this:为了解释我的担忧:假设我的代码如下所示:

   kernelFunction<<<gridSize,blockSize>>>(UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements,x,y,ect...)

The UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements arguments are located at each thread stock , UINT imageWidth、UINT imageWidth、UINT imageStride、UINT numberOfElements 参数位于每个线程 stock ,

I'm looking for a trick to send less arguments and access the data from other source.我正在寻找一种技巧来发送更少的参数并从其他来源访问数据。

I was thinking about using constant memory, but since constant memory is located on the global , I drop it.我正在考虑使用常量内存,但由于常量内存位于 global ,我放弃了它。 needless to say that the memory location should be fast.不用说,内存位置应该很快。

Kernel arguments are passed in via constant memory (or shared memory in sm_1x), so there is no replication as you suggest.内核参数通过常量内存(或 sm_1x 中的共享内存)传入,因此没有您建议的复制。

cf the programming guide :参见编程指南

__global__ function parameters are passed to the device: __global__ 函数参数传递给设备:

  • via shared memory and are limited to 256 bytes on devices of compute capability 1.x,通过共享内存,在计算能力为 1.x 的设备上限制为 256 字节,
  • via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher.通过恒定内存,并且在计算能力为 2.x 及更高版本的设备上限制为 4 KB。

Of course, if you subsequently modify one of variable in your code then you're modifying a local copy (as per the C standard) and hence each thread will have its own copy, either in registers or, if needed, on the stack.当然,如果您随后修改了代码中的一个变量,那么您就是在修改本地副本(按照 C 标准),因此每个线程都有自己的副本,无论是在寄存器中,还是在需要时,在堆栈中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM