計算具有推力的設備陣列的歸約和

Question

我知道我們可以用這樣的推力計算 CPU（主機）數組的總和。

int data[6] = {1, 0, 2, 2, 1, 3};
int result = thrust::reduce(data, data + 6, 0);

我們可以在沒有cudaMemcpy的情況下找到 GPU 陣列的總和到 CPU 陣列嗎？
假設我有一個像這樣使用cudaMalloc創建的設備數組，

cudaMalloc(&gpuspeed, n* sizeof(int));

並使用一些內核對gpuspeed進行了修改。 現在我可以找到推力的總和嗎？ 如果可以，我必須做出哪些改變？

Answer 1

是的，你可以用推力做到這一點。

您可以將設備指針傳遞給推力，如果您使用推力執行策略明確指定設備執行路徑，推力將做正確的事情。

或者，您可以使用thrust::device_ptr來引用您的數據，推力也會做正確的事情，即使沒有明確指定設備執行路徑。

這個答案涵蓋了這兩種方法，盡管使用了inclusive_scan 。

這是一個例子：

$ cat t137.cu
#include <thrust/reduce.h>
#include <thrust/device_ptr.h>
#include <thrust/execution_policy.h>
#include <iostream>

__global__ void k(int *d, int n){
  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  if (idx < n)
    d[idx] = idx;
}
const int ds = 10;
const int nTPB = 256;
int main(){

  int *d, r1, r2;
  cudaMalloc(&d, ds*sizeof(d[0]));
  k<<<(ds+nTPB-1)/nTPB,nTPB>>>(d, ds);
  thrust::device_ptr<int> tdp = thrust::device_pointer_cast(d);
  r1 = thrust::reduce(tdp, tdp+ds);
  r2 = thrust::reduce(thrust::device, d, d+ds);
  std::cout << "r1: "  << r1 << " r2: " << r2 << std::endl;
}
$ nvcc -std=c++14 -o t137 t137.cu
$ ./t137
r1: 45 r2: 45
$

計算具有推力的設備陣列的歸約和

問題描述

1 個解決方案

解決方案1
1 已采納 2021-05-07 16:50:00

計算具有推力的設備陣列的歸約和

問題描述

1 個解決方案

解決方案1 1 已采納 2021-05-07 16:50:00

解決方案1
1 已采納 2021-05-07 16:50:00