简体   繁体   English

使用带有 CUDA(和集成内存)的数据指针

[英]Using a data pointer with CUDA (and integrated memory)

I am using a board with integrated gpu and cpu memory.我正在使用带有集成 gpu 和 cpu 内存的板。 I am also using an external matrix library (Blitz++).我也在使用外部矩阵库(Blitz++)。 I would like to be able to grab the pointer to my data from the matrix object and pass it into a cuda kernel.我希望能够从矩阵对象中获取指向我的数据的指针并将其传递到 cuda 内核中。 After doing some digging, it sounds like I want to use some form of a zero copy by calling cudaHostGetDevicePointer .在做了一些挖掘之后,听起来我想通过调用cudaHostGetDevicePointer来使用某种形式的零副本。 What I am unsure of is the allocation of the memory.我不确定的是内存的分配。 Do I have to have created the pointer using cudaHostAlloc ?我是否必须使用cudaHostAlloc创建指针? I do not want to have to re-write Blitz++ to do cudaHostAlloc if I don't have to.如果不需要,我不想重新编写 Blitz++ 来执行cudaHostAlloc

My code currently works, but does a copy of the matrix data every time.我的代码目前有效,但每次都会复制矩阵数据。 That is not needed on the integrated memory cards.集成存储卡不需要。

The pointer has to be created (ie allocated) with cudaHostAlloc , even on integrated systems like Jetson.必须使用cudaHostAlloc创建(即分配)指针,即使在像 Jetson 这样的集成系统上也是如此。

The reason for this is that the GPU requires (zero-copy) memory to be pinned , ie removed from the host demand-paging system.这样做的原因是,GPU需要(零拷贝)内存进行固定,即从主机需求为寻呼系统中删除。 Ordinary allocations are subject to demand-paging, and may not be used as zero-copy ie mapped memory for the GPU.普通分配受需求分页的影响,不能用作零拷贝,即 GPU 的映射内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM