简体   繁体   English

用CUB减少金额

[英]Sum reduction with CUB

According to this article , sum reduction with CUB Library should be one of the fastest way to make parallel reduction. 根据本文 ,使用CUB库减少总和应该是进行并行减少的最快方法之一。 As you can see in a code fragment below, the execution time is measure excluding first cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum()); 如您在下面的代码片段中所看到的,执行时间是不包括第一个cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum()); I assume that it's something connected with memory preparation and when we reduce several times the same data it isn't neccesary to call it every time but when I've got many different arrays with the same number of elements and type of data do I have to do it every time? 我认为这与内存准备有关,当我们将相同的数据减少数倍时,不必每次都调用它,但是当我有许多具有相同数量的元素和数据类型的不同数组时,每次都做吗? If the answer is yes, it means that usage of CUB Library becomes pointless. 如果答案是肯定的,则意味着CUB库的使用变得毫无意义。

  size_t temp_storage_bytes;
  int* temp_storage=NULL;
  cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum());
  cudaMalloc(&temp_storage,temp_storage_bytes);

  cudaDeviceSynchronize();
  cudaCheckError();
  cudaEventRecord(start);

  for(int i=0;i<REPEAT;i++) {
    cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum());
  }
  cudaEventRecord(stop);
  cudaDeviceSynchronize();

I assume that it's something connected with memory preparation and when we reduce several times the same data it isn't neccesary to call it every time 我认为这与内存准备有关,当我们减少相同数据几倍时,不必每次都调用它

That's correct. 没错

but when I've got many different arrays with the same number of elements and type of data do I have to do it every time? 但是,当我有许多具有相同数量的元素和数据类型的不同数组时,我每次都必须这样做吗?

No, you don't need to do it every time. 不,您不需要每次都这样做。 The sole purpose of the "first" call to cub::DeviceReduce::Reduce (ie when temp_storage=NULL ) is to provide the number of bytes required for the temporary storage needed by CUB. cub::DeviceReduce::Reduce的“首次”调用的唯一目的(即,当temp_storage=NULL )是提供CUB所需的临时存储所需的字节数。 If the type and size of your data does not change, there is no need to re-run either this step or the subsequent cudaMalloc operation. 如果数据的类型和大小不变,则无需重新运行此步骤或后续的cudaMalloc操作。 You can simply call cub::DeviceReduce::Reduce again (with temp_storage pointing to the previous allocation provided by cudaMalloc ) on your "new" data, as long as the size and type of data is the same. 您可以简单地在“新”数据上再次调用cub::DeviceReduce::Reducetemp_storage指向cudaMalloc提供的先前分配),只要数据的大小和类型相同即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM