[英]Dynamically allocated shared memory in CUDA. Execution Configuration
What does by this Nvidia means? 这个 Nvidia是什么意思?
Ns
is of typesize_t
and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory;Ns
大小为size_t
,它指定共享内存中为该调用动态分配的每个块中除静态分配的内存之外的字节数; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in__shared__
;动态分配的内存由
__shared__
提到的声明为外部数组的任何变量使用;Ns
is an optional argument which defaults to 0;Ns
是可选参数,默认为0;
Size of shared memory in my GPU is 48kB. 我的GPU中的共享内存大小为48kB。 For example I want to run 4 kernels at the same time, every of them uses 12kB of shared memory.
例如,我想同时运行4个内核,每个内核使用12kB的共享内存。
In order to do that, should I call kernek this way 为此,我应该这样称呼kernek吗
kernel<<< gridSize, blockSize, 12 * 1024 >>>();
or should the third argument be 48 * 1024 ? 或者第三个参数应该是48 * 1024?
Ns in a size in bytes. Ns,以字节为单位。 If you want to reserve
12kB
of shared memory you would do 12*1024*1024
. 如果要保留
12kB
的共享内存,可以执行12*1024*1024
。
I doubt you want to do this. 我怀疑你想这样做。 Ns value is
PER BLOCK
. Ns的值为
PER BLOCK
。 So it is the amount of shared memory per block executing on the device. 因此,它是设备上执行的每个块的共享内存量。 I'm guessing you'd like to do something around the lines of
12*1024*1024/number_of_blocks;
我猜您想对
12*1024*1024/number_of_blocks;
Kernel launching with concurrency: If as mentioned in a comment, you are using streams there is a fourth input in the kernel launch which is the cuda stream. 并发启动内核:如果如注释中所述,您正在使用流,则内核启动中有第四个输入是cuda流。
If you want to launch a kernel on another stream without any shared memory it will look like: 如果要在没有任何共享内存的情况下在另一个流上启动内核,它将看起来像:
kernel_name<<<128, 128, 0, mystream>>>(...);
but concurrency is a whole different issue. 但是并发是一个完全不同的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.