如何将结构的C ++数组传递给CUDA设备？

Question

I've spent 2 days trying to figure this out and getting nowhere. 我花了两天的时间试图解决这个问题并且无处可去。 Say I had a struct that looks like this: 假设我有一个看起来像这样的结构：

struct Thing {
    bool is_solid;
    double matrix[9];
}

I want to create an array of that struct called things and then process that array on the GPU. 我想创建一个名为things结构数组，然后在GPU上处理该数组。 Something like: 就像是：

Thing *things;
int num_of_things = 100;
cudaMallocManaged((void **)&things, num_of_things * sizeof(Thing));

// Something missing here? Malloc individual structs? Everything I try doesn't work.

things[10].is_solid = true; // Segfaults

Is it even best practice to do it this way rather than pass a single struct with arrays that are num_of_things large? 以这种方式执行此操作是否是最佳实践，而不是使用num_of_things较大的数组传递单个结构？ It seem to me that can get pretty nasty especially when you have arrays already (like matrix , which would need to be 9 * num_of_things . 在我看来，可能会变得非常讨厌，尤其是当你已经有阵列时（比如matrix ，需要9 * num_of_things 。

Any info would be much appreciated! 任何信息将不胜感激！

Answer 1

After some dialog in the comments, it seems that OP's posted code has no issues. 在评论中的一些对话框之后，似乎OP发布的代码没有问题。 I was able to successfully compile and run this test case built around that code, and so was OP: 我能够成功编译并运行围绕该代码构建的测试用例，OP也是如此：

$ cat t1005.cu
#include <iostream>

struct Thing {
    bool is_solid;
    double matrix[9];
};

int main(){

  Thing *things;
  int num_of_things = 100;
  cudaError_t ret = cudaMallocManaged((void **)&things, num_of_things * sizeof(Thing));
  if (ret != cudaSuccess) {
    std::cout << cudaGetErrorString(ret) << std::endl;
    return 1;}
  else {
    things[10].is_solid = true;
    std::cout << "Success!" << std::endl;
    return 0;}
}
$ nvcc -arch=sm_30 -o t1005 t1005.cu
$ ./t1005
Success!
$

Regarding this question: 关于这个问题：

Is it even best practice to do it this way rather than pass a single struct with arrays that are num_of_things large? 以这种方式执行此操作是否是最佳实践，而不是使用num_of_things较大的数组传递单个结构？

Yes, this is a sensible practice and is usable whether managed memory is being used or not. 是的，这是一种明智的做法，无论是否使用托管内存，都可以使用。 An array of more or less any structure that does not contain embedded pointers to dynamically allocated data elsewhere can be transferred to the GPU in a simple fashion using a single cudaMemcpy call (for example, if managed memory were not being used.) 可以使用单个cudaMemcpy调用以简单的方式将一个或多或少任何不包含嵌入式指针的结构数组转移到GPU（例如，如果未使用托管内存）。

To address the question about the 3rd ( flags ) parameter to cudaMallocManaged : 要解决有关cudaMallocManaged的3rd（ flags ）参数的问题：

If it is specified, it is not correct to pass zero (although OP's posted code gives no evidence of that.) You should use one of the documented choices . 如果指定了，则传递零是不正确的（尽管OP发布的代码没有提供任何证据。）您应该使用其中一个记录的选项。
If it is not specified, this is still valid, and a default argument of cudaMemAttachGlobal is provided. 如果未指定，则仍然有效，并提供cudaMemAttachGlobal的默认参数。 This can be confirmed by reviewing the cuda_runtime.h file or else simply compiling/running the test code above. 这可以通过查看cuda_runtime.h文件来确认，或者只是编译/运行上面的测试代码。 This particular point appears to be an oversight in the documentation, and I've filed an internal issue at NVIDIA to take a look at that. 这个特殊点似乎是对文档的疏忽，我在NVIDIA上提出了一个内部问题来看一看。 So it's possible the documentation may change in the future with respect to this. 因此，文档可能会在未来发生变化。

Finally, proper cuda error checking is always in order any time you are having trouble with a CUDA code, and the use of such may shed some light on any errors that are made. 最后，在您遇到CUDA代码时遇到问题时，总是按顺序进行正确的cuda错误检查，并且使用这些错误检查可能会对所发生的任何错误有所了解。 The seg fault that the OP reported in code comments was almost certainly due to the cudaMallocManaged call failing (perhaps because a zero parameter was supplied incorrectly) and as a result the pointer in question ( things ) had no actual allocation. OP在代码注释中报告的seg错误几乎肯定是由于cudaMallocManaged调用失败（可能是因为错误地提供了零参数），因此有问题的指针（ things ）没有实际分配。 Subsequent usage of that pointer would lead to a seg fault. 随后使用该指针将导致seg错误。 My test code demonstrates how to avoid that seg fault, even if the cudaMallocManaged call fails for some reason, and the key is proper error checking. 我的测试代码演示了如何避免seg故障，即使cudaMallocManaged调用由于某种原因失败，并且密钥是正确的错误检查。

如何将结构的C ++数组传递给CUDA设备？

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-12-18 16:52:43

如何将结构的C ++数组传递给CUDA设备？

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-12-18 16:52:43

解决方案1
3 已采纳 2015-12-18 16:52:43