如果使用动态共享内存分配，则正确的内核调用

Question

developers, 开发人员，

may someone give me a hint please? 有人可以给我一个提示吗？ I didn't find any information about how to allocate constant and dynamic shared memory in the same kernel, or lets ask more preciously: How to call a kernel where the amount of shared memory that needs to allocated is just partly known at compilation time? 我没有找到有关如何在同一内核中分配常量和动态共享内存的任何信息，或者让我们更珍贵地问：如何在编译时仅部分知道需要分配的共享内存量的情况下调用内核？ Referring to allocating shared memory for example, it becomes pretty obvious how to do for dynamic allocation. 例如，在分配共享内存时，如何进行动态分配就变得非常明显。 But lets assume I have the following kernel: 但是，假设我具有以下内核：

__global__ void MyKernel(int Float4ArrSize, int FloatArrSize)
{
  __shared__ float Arr1[256];
  __shared__ char  Arr2[256];
  extern __shared_ float DynamArr[];
  float4* DynamArr1 = (float4*) DynamArr;
  float* DynamArr = (float*) &DynamArr1[Float4ArrSize];

  // do something
}

Kernel Call: 内核调用：

int SharedMemorySize = Float4ArrSize + FloatArrSize;

SubstractKernel<<< numBlocks, threadsPerBlock, SharedMemorySize, stream>>>(Float4ArrSize, FloatArrSize)

I'm actually wasn't able to figure out how the compiler is linking the size a shared memory only to the part I want to allocate dynamically. 我实际上无法弄清楚编译器如何将共享内存的大小仅链接到我要动态分配的部分。 Or does the parameter "SharedMemeorySize" represents the total amount of shared memory per block, so I need to calculate in the size of constant memory (int SharedMemorySize = Float4ArrSize + FloatArrSize + 256*sizeof(float)+ 256*sizeof(char)) ? 还是参数“ SharedMemeorySize”代表每个块的共享内存总量，所以我需要计算常量内存的大小（int SharedMemorySize = Float4ArrSize + FloatArrSize + 256 * sizeof（float）+ 256 * sizeof（char））？

Please enlighten me or just simply point to some code snippets. 请启发我，或者只是指向一些代码片段。 Thanks a lot in advance. 非常感谢。

cheers greg 干杯格雷格

Answer 1

Citing programing guide, SharedMemorySize specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; 引用编程指南， SharedMemorySize指定共享内存中的字节数，该共享内存中除静态分配的内存外，还为每个调用为此块动态分配了字节数。 this dynamically allocated memory is used by any of the variables declared as an external array. 该动态分配的内存由声明为外部数组的任何变量使用。 SharedMemorySize is an optional argument which defaults to 0. SharedMemorySize是一个可选参数，默认为0。

So if I understand what you want to do, it should probably look like 因此，如果我了解您想要做什么，它应该看起来像

extern __shared_ float DynamArr[];
float*  DynamArr1 = DynamArr;
float4* DynamArr2 = (float4*) &DynamArr[DynamArr1_size];

Be aware, I didn't test it. 请注意，我没有对其进行测试。

Here is very useful post. 这是非常有用的帖子。

Answer 2

From the CUDA programming guide : 从CUDA编程指南中：

The [kernel's] execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where: 通过在函数名称和带括号的参数列表之间插入<<< Dg，Db，Ns，S >>>形式的表达式来指定[内核的]执行配置，其中：

Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; Ns的大小为size_t，它指定共享内存中为该调用动态分配的每个块中除静态分配的内存之外的字节数； this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __ shared __ ; 这个动态分配的存储器用于由任何如在提及声明为外部阵列的变量__ 共享 __ ; Ns is an optional argument which defaults to 0; Ns是可选参数，默认为0；

So basically, the shared memory size that you specify during the kernel call is related to the dinamically allocated shared memory. 因此，基本上，您在内核调用期间指定的共享内存大小与动态分配的共享内存有关。 You don't have to manually add the size of your statically allocated arrays in shared memory. 您不必在共享内存中手动添加静态分配的数组的大小。

如果使用动态共享内存分配，则正确的内核调用

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-03-18 17:06:11

解决方案2
1 2013-03-18 17:05:37

如果使用动态共享内存分配，则正确的内核调用

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-03-18 17:06:11

解决方案2 1 2013-03-18 17:05:37

解决方案1
2 已采纳 2013-03-18 17:06:11

解决方案2
1 2013-03-18 17:05:37