二维数组的CUDA动态共享内存分配

Question

I want to allocate a 2d array in shared memory in CUDA.我想在 CUDA 的共享内存中分配一个二维数组。 I know that to allocate a 1d shared memory array you have to pass the size per block as a parameter to the kernel.我知道要分配一维共享内存数组，您必须将每个块的大小作为参数传递给内核。

I also know that it is impossible 2 create an actual 2d array dynamically in shared memory.我也知道 2 在共享内存中动态创建一个实际的二维数组是不可能的。

However i was wondering if this could be done if one of the dimensions is known.但是我想知道如果其中一个维度是已知的，是否可以做到这一点。

extern __shared__ int array[COMPILE_TIME_SIZE][];

Can this be done?这能做到吗？ If so how do i pass the size of the second dimension?如果是这样，我如何传递第二维的大小？

Answer 1

Doing exactly so, with the known dimension in the first place (highest order - first square bracket entry) is not possible, as the compiler may not implement addressing properly.完全这样做，首先使用已知维度（最高阶 - 第一个方括号条目）是不可能的，因为编译器可能无法正确实现寻址。

However, it is possible to do so setting the second parameter only at compile time.但是，可以仅在编译时设置第二个参数。 Here is an example code:这是一个示例代码：

extern __shared__ int shared2Darray[][17] ;

__global__ void kernel(int* output)
{
    shared2Darray[threadIdx.y][threadIdx.x] = threadIdx.x + 2*threadIdx.y ;
    __syncthreads();
    output [threadIdx.y * blockDim.x + threadIdx.x] = shared2Darray[threadIdx.y][threadIdx.x] ;
    __syncthreads();
}

int main()
{
    int* h_output, *d_output ;

    cudaMalloc(&d_output, 16*16*sizeof(int));

    kernel<<<1, dim3(16,16,1), 16*17*sizeof(int)>>> (d_output) ;

    h_output = new int[16*16] ;
    cudaMemcpy (h_output, d_output, 16*16*sizeof(int), cudaMemcpyDeviceToHost) ;

    cudaDeviceReset();

    for (int x = 0 ; x < 16 ; ++x)
    {
        for (int y = 0 ; y < 16 ; ++y)
        {
            if (h_output[y*16+x] != x+2*y)
                printf ("ERROR\n");
        }
    }

    printf ("DONE\n");

    delete[] h_output ;

    return 0 ;
}

The size of the array is defined by the shared memory parameter in the triple angled bracket notation.数组的大小由三尖括号表示法中的共享内存参数定义。 Hence, the size of the second dimension is deduced by dividing the shared memory size in bytes by the size in bytes of a single entry.因此，通过将共享内存大小（以字节为单位）除以单个条目的大小（以字节为单位）来推导出第二维的大小。

二维数组的CUDA动态共享内存分配

问题描述

1 个解决方案

解决方案1
1 2016-04-27 17:44:42

二维数组的CUDA动态共享内存分配

问题描述

1 个解决方案

解决方案1 1 2016-04-27 17:44:42

解决方案1
1 2016-04-27 17:44:42