简体   繁体   English

CUDA推力:从设备复制到设备

[英]CUDA thrust: copy from device to device

I have a memory array allocated in CUDA using standard CUDA malloc and it is passed to a function as follows: 我使用标准CUDA malloc在CUDA中分配了一个内存数组,并将其传递给函数,如下所示:

void MyClass::run(uchar4 * input_data)

I also have a class member which is a thrust device_ptr declared as: 我还有一个类成员,它是一个推力device_ptr,声明为:

thrust::device_ptr<uchar4> data = thrust::device_malloc<uchar4(num_pts);

Here num_pts is the number of values in the array and the input_data pointer is guaranteed to be num_pts long. 这里num_pts是数组中的值的数量,input_data指针保证为num_pts long。

Now, I would like to copy the input array into the thrust_device_ptr. 现在,我想将输入数组复制到thrust_device_ptr中。 I have looked at the thrust documentation and a lot of it is talking about copying from device to host memory and vice versa. 我查看了推文文档,其中很多内容涉及从设备复制到主机内存,反之亦然。 I was wondering what would be the most performance optimal way to do this device to device copy on thrust or should I just use cudaMemcpy? 我想知道什么是最佳性能最佳方式来执行此设备到设备复制推力或我应该只使用cudaMemcpy?

The canonical way to do this is just to use thrust::copy . 这样做的规范方法就是使用thrust::copy The thrust::device_ptr has standard pointer semantics and the API will seamlessly understand whether the source and destination pointers are on the host or device, viz: thrust::device_ptr具有标准指针语义,API将无缝地了解源指针和目标指针是否在主机或设备上,即:

#include <thrust/device_malloc.h>
#include <thrust/device_ptr.h>
#include <thrust/copy.h>
#include <iostream>

int main()
{
    // Initial host data
    int ivals[4] = { 1, 3, 6, 10 };

    // Allocate and copy to first device allocation
    thrust::device_ptr<int> dp1 = thrust::device_malloc<int>(4);
    thrust::copy(&ivals[0], &ivals[0]+4, dp1);

    // Allocate and copy to second device allocation
    thrust::device_ptr<int> dp2 = thrust::device_malloc<int>(4);
    thrust::copy(dp1, dp1+4, dp2);

    // Copy back to host
    int ovals[4] = {-1, -1, -1, -1};
    thrust::copy(dp2, dp2+4, &ovals[0]);

    for(int i=0; i<4; i++)
        std::cout << ovals[i] << std::endl;


    return 0;
}

which does this: 这样做:

talonmies@box:~$ nvcc -arch=sm_30 thrust_dtod.cu 
talonmies@box:~$ ./a.out 
1
3
6
10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM