简体   繁体   English

是否存在一些thrust :: device_vector等效库,在CUDA内核中使用?

[英]Does exists some thrust::device_vector equivalent library, to use within CUDA kernel?

The automatic memory management of thrust::device_vector is really useful, the only drawback is that it's not possible to use it from within a kernel code. thrust :: device_vector的自动内存管理非常有用,唯一的缺点是无法在内核代码中使用它。

I've looked on the Internet and just found vector libraries such as thrust, that deals with device memory from host code. 我在互联网上看到了刚刚找到的矢量库,例如推力,它处理来自主机代码的设备内存。 Does any vector library for kernels exists? 内核的任何矢量库是否存在? If not, is it a bad idea to have such a library? 如果没有,拥有这样的图书馆是个坏主意吗?

It is possible to write such a library, but it would be very inefficient. 可以编写这样的库,但效率很低。

Indeed thrust::device_vector only differs from thrust::host_vector or std::vector in that it allocates memory on the device instead of the host. 事实上,thrust :: device_vector与thrust :: host_vector或std :: vector的区别仅在于它在设备而不是主机上分配内存。 The resizing algorithm is the same, and runs on the host. 调整大小算法是相同的,并在主机上运行。

The resize logic is quite simple but involves allocating/freeing memory and copying the data. 调整大小逻辑非常简单,但涉及分配/释放内存和复制数据。 In a multi-threaded setting, you have to lock the whole vector each time a thread resizes it - which can be quite long because of the copy. 在多线程设置中,每次线程调整大小时都必须锁定整个向量 - 由于复制,这可能会很长。

In the case of a kernel which appends elements to a vector, the synchronization mechanism would actually serialize the work since only one thread at a time is allowed to resize. 在将元素附加到向量的内核的情况下,同步机制实际上将序列化工作,因为一次只允许一个线程调整大小。 Thus your code would run at the speed of a single device processor, minus the (quite big) synchronization overhead. 因此,您的代码将以单个设备处理器的速度运行,减去(相当大的)同步开销。 This would probably be quite a lot slower than a CPU implementation. 这可能比CPU实现慢得多。

Thrust cannot be used within a kernel, however, a thrust::device_vector can be used up to the interface with the kernel. Thrust不能在内核中使用,但是, thrust::device_vector可以用于与内核的接口。 At that point, a pointer to the underlying data can be passed to the kernel. 此时,可以将指向底层数据的指针传递给内核。 For example: 例如:

thrust::device_vector<int> my_int_vector;

my_kernel<<<blocks, threads>>>(thrust::raw_pointer_cast(my_int_vector.data());

Depending on your situation this may still mean the Thrust library is useful even when implementing your own kernels. 根据您的情况,这可能仍然意味着Thrust库即使在实现您自己的内核时也很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM