如何分配指针数组并为cuda中的多个内核调用保留它们

Question

I am trying to implement an algorithm in cuda and I need to allocate an Array of Pointers that point to an Array of Structs. 我正在尝试在cuda中实现一种算法，我需要分配一个指向结构数组的指针数组。 My struct is, lets say: 我的结构是，可以说：

    typedef struct {
       float x, y; 
    } point;

I know that If I want to preserve the arrays for multiple kernel calls I have to control them from the host, is that right? 我知道，如果我想为多个内核调用保留数组，则必须从主机控制它们，对吗？ The initialization of the pointers must be done from within the kernel. 指针的初始化必须在内核内部完成。 To be more specific, the Array of Struct P will contain random order of cartesian points while the dev_S_x will be a sorted version as to x coordinate of the points in P . 更具体地说， Array of Struct P的Array of Struct P将包含笛卡尔点的随机顺序，而dev_S_x将是P中点的x坐标的排序版本。

I have tried with: 我尝试过：

__global__ void test( point *dev_P, point **dev_S_x) {
    unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;

    dev_P[tid].x = 3.141516;
    dev_P[tid].y = 3.141516;
    dev_S_x[tid] = &dev_P[tid];
   ...
}

and: 和：

 int main( void ) {
     point *P, *dev_P, **S_x, *dev_S_x;
     P   = (point*)  malloc (N * sizeof (point) );
     S_x = (point**) malloc (N * sizeof (point*));

     // allocate the memory on the GPU
     cudaMalloc( (void**)  &dev_P,   N * sizeof(point) );
     cudaMalloc( (void***)  &dev_S_x, N * sizeof(point*));

     // copy the array P to the GPU
     cudaMemcpy( dev_P, P,  N * sizeof(point),  cudaMemcpyHostToDevice);
     cudaMemcpy( dev_S_x,S_x,N * sizeof(point*), cudaMemcpyHostToDevice);

     test <<<1, 1 >>>( dev_P, &dev_S_x);
        ...
     return 0;
}

which leads to many 这导致许多

First-chance exception at 0x000007fefcc89e5d (KernelBase.dll) in Test_project_cuda.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0020f920.. Critical error detected c0000374

Am I doing something wrong in the cudamalloc of the array of pointers or is it something else? 我在指针数组的cudamalloc中做错了还是其他事情？ Is the usage of (void***) correct? (void***)的用法正确吗？ I would like to use for example dev_S_x[tid]->x or dev_S_x[tid]->y from within the kernels pointing to device memory addresses. 我想在内核中使用dev_S_x[tid]->x或dev_S_x[tid]->y指向设备内存地址。 Is that feasible? 那可行吗？ Thanks in advance 提前致谢

Answer 1

dev_S_x should be declared as point ** and should be passed to the kernel as a value (ie test <<<1, 1 >>>(dev_P, dev_S_x); ). dev_S_x应该声明为point **并应作为值传递到内核（即test <<<1, 1 >>>(dev_P, dev_S_x); ）。

Putting that to one side, what you describe sounds like a natural fit for Thrust , which will give you a simpler memory management strategy and access to fast sort routines. 一方面，您所描述的内容听起来很适合Thrust ，这将为您提供一种更简单的内存管理策略，并可以访问快速排序例程。

如何分配指针数组并为cuda中的多个内核调用保留它们

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-07-31 10:13:07

如何分配指针数组并为cuda中的多个内核调用保留它们

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-07-31 10:13:07

解决方案1
1 已采纳 2013-07-31 10:13:07