简体繁体 English

使用带有 CUDA（和集成内存）的数据指针

[英]Using a data pointer with CUDA (and integrated memory)

原文 2015-06-17 19:09:50 5 1 c++/ memory-management/ cuda

I am using a board with integrated gpu and cpu memory.我正在使用带有集成 gpu 和 cpu 内存的板。 I am also using an external matrix library (Blitz++).我也在使用外部矩阵库（Blitz++）。 I would like to be able to grab the pointer to my data from the matrix object and pass it into a cuda kernel.我希望能够从矩阵对象中获取指向我的数据的指针并将其传递到 cuda 内核中。 After doing some digging, it sounds like I want to use some form of a zero copy by calling cudaHostGetDevicePointer .在做了一些挖掘之后，听起来我想通过调用cudaHostGetDevicePointer来使用某种形式的零副本。 What I am unsure of is the allocation of the memory.我不确定的是内存的分配。 Do I have to have created the pointer using cudaHostAlloc ?我是否必须使用cudaHostAlloc创建指针？ I do not want to have to re-write Blitz++ to do cudaHostAlloc if I don't have to.如果不需要，我不想重新编写 Blitz++ 来执行cudaHostAlloc 。

My code currently works, but does a copy of the matrix data every time.我的代码目前有效，但每次都会复制矩阵数据。 That is not needed on the integrated memory cards.集成存储卡不需要。

1 个解决方案

The pointer has to be created (ie allocated) with cudaHostAlloc , even on integrated systems like Jetson.必须使用cudaHostAlloc创建（即分配）指针，即使在像 Jetson 这样的集成系统上也是如此。

The reason for this is that the GPU requires (zero-copy) memory to be pinned , ie removed from the host demand-paging system.这样做的原因是，GPU需要（零拷贝）内存进行固定，即从主机需求为寻呼系统中删除。 Ordinary allocations are subject to demand-paging, and may not be used as zero-copy ie mapped memory for the GPU.普通分配受需求分页的影响，不能用作零拷贝，即 GPU 的映射内存。