简体   繁体   English

在CUDA中动态分配的共享内存。 执行配置

[英]Dynamically allocated shared memory in CUDA. Execution Configuration

What does by this Nvidia means? 这个 Nvidia是什么意思?

Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; Ns大小为size_t ,它指定共享内存中为该调用动态分配的每个块中除静态分配的内存之外的字节数; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __shared__ ; 动态分配的内存由__shared__提到的声明为外部数组的任何变量使用; Ns is an optional argument which defaults to 0; Ns是可选参数,默认为0;

Size of shared memory in my GPU is 48kB. 我的GPU中的共享内存大小为48kB。 For example I want to run 4 kernels at the same time, every of them uses 12kB of shared memory. 例如,我想同时运行4个内核,每个内核使用12kB的共享内存。

In order to do that, should I call kernek this way 为此,我应该这样称呼kernek吗

kernel<<< gridSize, blockSize, 12 * 1024 >>>();

or should the third argument be 48 * 1024 ? 或者第三个参数应该是48 * 1024?

Ns in a size in bytes. Ns,以字节为单位。 If you want to reserve 12kB of shared memory you would do 12*1024*1024 . 如果要保留12kB的共享内存,可以执行12*1024*1024

I doubt you want to do this. 我怀疑你想这样做。 Ns value is PER BLOCK . Ns的值为PER BLOCK So it is the amount of shared memory per block executing on the device. 因此,它是设备上执行的每个块的共享内存量。 I'm guessing you'd like to do something around the lines of 12*1024*1024/number_of_blocks; 我猜您想对12*1024*1024/number_of_blocks;

Kernel launching with concurrency: If as mentioned in a comment, you are using streams there is a fourth input in the kernel launch which is the cuda stream. 并发启动内核:如果如注释中所述,您正在使用流,则内核启动中有第四个输入是cuda流。

If you want to launch a kernel on another stream without any shared memory it will look like: 如果要在没有任何共享内存的情况下在另一个流上启动内核,它将看起来像:

kernel_name<<<128, 128, 0, mystream>>>(...);

but concurrency is a whole different issue. 但是并发是一个完全不同的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM