简体   繁体   English

如何使用推力根据索引累加数组?

[英]How to use thrust to accumulate array based on index?

I am trying to accumulate array based on index.我正在尝试根据索引累积数组。 My inputs are two vectors with same length.我的输入是两个长度相同的向量。 1st vector is the index.第一个向量是索引。 2nd vector are the value.第二个向量是值。 My goal is to accumulate the value based on index.我的目标是根据索引累积值。 I have a similar code in c++.我在 c++ 中有类似的代码。 But I am new in thrust coding.但我是推力编码的新手。 Could I achieve this with thrust device code?我可以用推力设备代码实现这一点吗? Which function could I use?我可以使用哪个 function? I found no "map" like functions.我发现没有类似“地图”的功能。 Is it more efficient than the CPU(host) code?它比 CPU(主机)代码更有效吗? My c++ version mini sample code.我的 c++ 版迷你示例代码。

int a[10]={1,2,3,4,5,1,1,3,4,4};
vector<int> key(a,a+10);
double b[10]={1,2,3,4,5,1,2,3,4,5};
vector<double> val(b,b+10);

unordered_map<size_t,double> M;
for (size_t i = 0;i< 10 ;i++)
{
    M[key[i]] = M[key[i]]+val[i];
}

As indicated in the comment, the canonical way to do this would be to reorder the data (keys, values) so that like keys are grouped together.如评论中所述,执行此操作的规范方法是重新排序数据(键、值),以便将相似的键组合在一起。 You can do this with sort_by_key .您可以使用sort_by_key来做到这一点。 reduce_by_key then solves. reduce_by_key然后解决。

It is possible, in a slightly un-thrust-like way, to also solve the problem without reordering, using a functor provided to for_each , that has an atomic.使用提供给for_each的具有原子性的函子,也可以以一种稍微不像推力的方式来解决问题而无需重新排序。

The following illustrates both:以下说明了两者:

$ cat t27.cu
#include <thrust/reduce.h>
#include <thrust/sort.h>
#include <thrust/device_vector.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/for_each.h>
#include <thrust/copy.h>
#include <iostream>
#include <unordered_map>
#include <vector>

// this functor only needed for the non-reordering case
// requires compilation for a cc6.0 or higher GPU e.g. -arch=sm_60
struct my_func {
  double *r;
  my_func(double *_r) : r(_r) {};
  template <typename T>
  __host__ __device__
  void operator()(T t) {
    atomicAdd(r+thrust::get<0>(t)-1, thrust::get<1>(t));  // assumes consecutive keys starting at 1
  }
};

int main(){

  int a[10]={1,2,3,4,5,1,1,3,4,4};
  std::vector<int> key(a,a+10);
  double b[10]={1,2,3,4,5,1,2,3,4,5};
  std::vector<double> val(b,b+10);

  std::unordered_map<size_t,double> M;
  for (size_t i = 0;i< 10 ;i++)
  {
    M[key[i]] = M[key[i]]+val[i];
  }
  for (int i = 1; i < 6; i++) std::cout << M[i] << " ";
  std::cout << std::endl;
  int size_a = sizeof(a)/sizeof(a[0]);
  thrust::device_vector<int>    d_a(a, a+size_a);
  thrust::device_vector<double> d_b(b, b+size_a);
  thrust::device_vector<double> d_r(5); //assumes only 5 keys, for illustration
  thrust::device_vector<int> d_k(5); // assumes only 5 keys, for illustration
  // method 1, without reordering
  thrust::for_each_n(thrust::make_zip_iterator(thrust::make_tuple(d_a.begin(), d_b.begin())), size_a, my_func(thrust::raw_pointer_cast(d_r.data())));
  thrust::host_vector<double> r = d_r;
  thrust::copy(r.begin(), r.end(), std::ostream_iterator<double>(std::cout, " "));
  std::cout << std::endl;
  thrust::fill(d_r.begin(), d_r.end(), 0.0);
  // method 2, with reordering
  thrust::sort_by_key(d_a.begin(), d_a.end(), d_b.begin());
  thrust::reduce_by_key(d_a.begin(), d_a.end(), d_b.begin(), d_k.begin(), d_r.begin());
  thrust::copy(d_r.begin(), d_r.end(), r.begin());
  thrust::copy(r.begin(), r.end(), std::ostream_iterator<double>(std::cout, " "));
  std::cout << std::endl;
}
$ nvcc -o t27 t27.cu -std=c++14 -arch=sm_70
$ ./t27
4 2 6 13 5
4 2 6 13 5
4 2 6 13 5
$

I make no statements about relative performance of these approaches.我不对这些方法的相对性能做任何陈述。 It would probably depend on the actual data set size, and possibly the GPU being used and other factors.这可能取决于实际数据集的大小,也可能取决于正在使用的 GPU 和其他因素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM