CUDA cudaMemcpyAsync 使用单个 stream 到主机

Question

I have a single kernel which is feeling data to two parameters (dev_out_1 and dev_out_2) using single stream .我有一个 kernel ，它使用单个stream 感受两个参数（dev_out_1 和 dev_out_2）的数据。 I wanted to copy back the data from the device to host in parallel.我想将设备中的数据并行复制回主机。 my requirement is to use single stream and copy back to the host in parallel.我的要求是使用单个 stream 并并行复制回主机。

How do you manage this kind of issues?您如何处理此类问题？

SomeCudaCall<<<25,34>>>(input, dev_out_1,dev_out_2);
cudaMemcpyAsync(toHere_1, dev_out_1, sizeof(int), cudaMemcpyDeviceToHost,0);
cudaMemcpyAsync(toHere_2, dev_out_2, sizeof(int), cudaMemcpyDeviceToHost,0);

Answer 1

I wanted to copy back the data from the device to host in parallel我想将设备中的数据并行复制回主机

That is not possible.这是不可能的。

NVIDIA GPUs can only use one DMA engine for device to host transfers (even in the case where there are more than one DMA engine), and the DMA engine can only perform a single transfer at a time. NVIDIA GPU 只能使用一个 DMA 引擎进行设备到主机的传输（即使在有多个 DMA 引擎的情况下），并且 DMA 引擎一次只能执行一次传输。 So "parallel" copies in the same direction over the PCI express bus are not possible.因此，不可能通过 PCI Express 总线在同一方向上进行“并行”复制。

CUDA cudaMemcpyAsync 使用单个 stream 到主机

问题描述

1 个解决方案

解决方案1
2 已采纳

CUDA cudaMemcpyAsync 使用单个 stream 到主机

问题描述

1 个解决方案

解决方案1 2 已采纳

解决方案1
2 已采纳