CUDA将继承的类对象复制到设备

Question

I have a Parent class and an inherited Child class: 我有一个Parent类和一个继承的Child类：

class Parent {};
class Child : public Parent {};

There are a couple child classes that inherit from Parent , but for simplicity, I only included one. 有两个从Parent继承的子类，但为简单起见，我仅包含其中一个。 These inherited classes are necessary for the project I am working on. 这些继承的类对于我正在从事的项目是必需的。 I also have an object from another class, which I wish to copy onto the device: 我还有另一个类的对象，希望将其复制到设备上：

class CopyClass {
  public:
    Parent ** par;
};

Note that the Parent ** par; 注意， Parent ** par; is there because I need to have a list of Child objects, but which child it will be using (and the length of the list) is unknown at compile time. 在那里是因为我需要一个Child对象的列表，但是在编译时它将使用哪个子对象（以及列表的长度）是未知的。 Here is my attempt at copying a CopyClass object onto the device: 这是我尝试将CopyClass对象复制到设备上的尝试：

int length = 5;

//Instantiate object on the CPU
CopyClass cpuClass;
cpuClass.par = new Parent*[length];
for(int i = 0; i < length; ++i) cpuClass.par[i] = new Child;

//Copy object onto GPU
CopyClass * gpuClass;
cudaMalloc(&gpuClass,sizeof(CopyClass));
cudaMemcpy(gpuClass,&cpuClass,sizeof(CopyClass),cudaMemcpyHostToDevice);

//Copy dynamically allocated variables to GPU
Parent ** d_par;
d_par = new Parent*[length];
for(int i = 0; i < length; ++i) {
    cudaMalloc(&d_par[i],sizeof(Child));
    printf("\tCopying data\n");
    cudaMemcpy(d_par[i],cpuClass.par[i],sizeof(Child),cudaMemcpyHostToDevice);
}

//SIGSEGV returned during following operation
cudaMemcpy(gpuClass->par,d_par,length*sizeof(void*),cudaMemcpyHostToDevice);

I have seen multiple similar problems to this here , here , here , here , and here , but either I couldnt understand the problem they were having, or it didn't seem to fit in with this particular issue. 我在这里，这里，这里，这里和这里已经看到了多个与此类似的问题，但是我要么无法理解他们所遇到的问题，要么似乎不适合这个特定问题。

I know that the segmentation fault I am getting is because gpuClass->par is on the device, and cudaMemCpy does not allow device pointers. 我知道我得到的分段错误是因为gpuClass->par位于设备上，而cudaMemCpy不允许设备指针。 However, I see no other way to "insert" the pointer into the gpuClass object. 但是，我看不到将指针“插入” gpuClass对象的其他方法。

The ways which I could see a solution is to: 我可以看到的解决方案是：

1) Flatten my data structure. 1）整理我的数据结构。 However, I don't know how to do this with the inherited class functionality that I want. 但是，我不知道如何使用我想要的继承的类功能来执行此操作。

2) Instantiate gpuClass originally on the gpu, which I don't know how to do, or 2）最初在gpu上实例化gpuClass ，我不知道该怎么做，或者

3) I have seen in one of the solutions that you can use cudaMemCpy to copy the address of your dynamically allocated list into an object, but once again, I don't know how to do that (specifically for copying a device pointer to the location of another device pointer). 3）我在一种解决方案中看到，您可以使用cudaMemCpy将动态分配的列表的地址复制到一个对象中，但是再次，我不知道该怎么做（特别是将设备指针复制到另一个设备指针的位置）。

Any help would be greatly appreciated. 任何帮助将不胜感激。

Answer 1

In your first related link I give 5 steps for the object based deep-copy sequence, but this case is complicated by the fact that you are doing a double-pointer version of the example given in that link. 在您的第一个相关链接中，我为基于对象的深度复制序列提供了5个步骤，但是由于您正在对该链接中给出的示例进行双指针版本操作，因此使这种情况变得复杂。 The complexity associated with a double-pointer deep-copy is such that the usual recommendation is to avoid it (ie flatten). 与双指针深拷贝相关的复杂性是通常的建议是避免它（即变平）。

The first fix we need to make to your code is to properly handle the d_par array. 我们需要对您的代码进行的第一个修复是正确处理d_par数组。 You need to make a corresponding allocation on the device to hold the array associated with d_par . 您需要在设备上进行相应的分配，以保存与d_par相关联的数组。 The array associated with d_par has storage for 5 object pointers. 与d_par关联的数组可存储5个对象指针。 You've allocated host-side storage for it (with new ) but nowhere are you are doing a device-side allocation for it. 您已经为其分配了主机端存储（带有new ），但是您无处在为其进行设备端分配。 (I'm not talking about the d_par pointer itself , I'm talking about what it points to , which is an array of 5 pointers). （我不是在谈论d_par指针本身，我说的是什么它指向的 ，这是5个指针数组）。

The second fix we need to make is to adjust the fixup of the par pointer itself (as opposed to what it points to), in the top-level device side object. 我们需要做的第二个修复是在顶级设备端对象中调整par指针本身（与其指向的对象相反）的修复。 You've attempted to combine both these into a single step, but that won't work. 您已经尝试将这两个步骤合并为一个步骤，但这是行不通的。

Here's a modified version of your code that seems to work correctly with the above changes: 这是您的代码的修改后的版本，似乎可以通过上述更改正常运行：

$ cat t29.cu
#include <stdio.h>

class Parent {public: int my_id;};
class Child : public Parent {};

class CopyClass {
  public:
    Parent ** par;
};

const int length = 5;

__global__ void test_kernel(CopyClass *my_class){

  for (int i = 0; i < length; i++)
    printf("object: %d, id: %d\n", i, my_class->par[i]->my_id);
}

int main(){


//Instantiate object on the CPU
  CopyClass cpuClass;
  cpuClass.par = new Parent*[length];
  for(int i = 0; i < length; ++i) {
    cpuClass.par[i] = new Child;
    cpuClass.par[i]->my_id = i+1;} // so we can prove that things are working

//Allocate storage for object onto GPU and copy host object to device
  CopyClass * gpuClass;
  cudaMalloc(&gpuClass,sizeof(CopyClass));
  cudaMemcpy(gpuClass,&cpuClass,sizeof(CopyClass),cudaMemcpyHostToDevice);

//Copy dynamically allocated child objects to GPU
  Parent ** d_par;
  d_par = new Parent*[length];
  for(int i = 0; i < length; ++i) {
    cudaMalloc(&d_par[i],sizeof(Child));
    printf("\tCopying data\n");
    cudaMemcpy(d_par[i],cpuClass.par[i],sizeof(Child),cudaMemcpyHostToDevice);
  }

//Copy the d_par array itself to the device

  Parent ** td_par;
  cudaMalloc(&td_par, length * sizeof(Parent *));
  cudaMemcpy(td_par, d_par, length * sizeof(Parent *), cudaMemcpyHostToDevice);

//copy *pointer value* of td_par to appropriate location in top level object
  cudaMemcpy(&(gpuClass->par),&(td_par),sizeof(Parent **),cudaMemcpyHostToDevice);

  test_kernel<<<1,1>>>(gpuClass);
  cudaDeviceSynchronize();
  return 0;


}
$ nvcc -arch=sm_61 -o t29 t29.cu
$ cuda-memcheck ./t29
========= CUDA-MEMCHECK
        Copying data
        Copying data
        Copying data
        Copying data
        Copying data
object: 0, id: 1
object: 1, id: 2
object: 2, id: 3
object: 3, id: 4
object: 4, id: 5
========= ERROR SUMMARY: 0 errors
$

CUDA将继承的类对象复制到设备

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-11-19 02:58:05

CUDA将继承的类对象复制到设备

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-11-19 02:58:05

解决方案1
1 已采纳 2016-11-19 02:58:05