[英]Sum reduction with CUB
According to this article , sum reduction with CUB Library should be one of the fastest way to make parallel reduction. 根据本文 ,使用CUB库减少总和应该是进行并行减少的最快方法之一。 As you can see in a code fragment below, the execution time is measure excluding first
cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum());
如您在下面的代码片段中所看到的,执行时间是不包括第一个
cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum());
I assume that it's something connected with memory preparation and when we reduce several times the same data it isn't neccesary to call it every time but when I've got many different arrays with the same number of elements and type of data do I have to do it every time? 我认为这与内存准备有关,当我们将相同的数据减少数倍时,不必每次都调用它,但是当我有许多具有相同数量的元素和数据类型的不同数组时,每次都做吗? If the answer is yes, it means that usage of CUB Library becomes pointless.
如果答案是肯定的,则意味着CUB库的使用变得毫无意义。
size_t temp_storage_bytes;
int* temp_storage=NULL;
cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum());
cudaMalloc(&temp_storage,temp_storage_bytes);
cudaDeviceSynchronize();
cudaCheckError();
cudaEventRecord(start);
for(int i=0;i<REPEAT;i++) {
cub::DeviceReduce::Reduce(temp_storage, temp_storage_bytes, in, out, N, cub::Sum());
}
cudaEventRecord(stop);
cudaDeviceSynchronize();
I assume that it's something connected with memory preparation and when we reduce several times the same data it isn't neccesary to call it every time
我认为这与内存准备有关,当我们减少相同数据几倍时,不必每次都调用它
That's correct. 没错
but when I've got many different arrays with the same number of elements and type of data do I have to do it every time?
但是,当我有许多具有相同数量的元素和数据类型的不同数组时,我每次都必须这样做吗?
No, you don't need to do it every time. 不,您不需要每次都这样做。 The sole purpose of the "first" call to
cub::DeviceReduce::Reduce
(ie when temp_storage=NULL
) is to provide the number of bytes required for the temporary storage needed by CUB. 对
cub::DeviceReduce::Reduce
的“首次”调用的唯一目的(即,当temp_storage=NULL
)是提供CUB所需的临时存储所需的字节数。 If the type and size of your data does not change, there is no need to re-run either this step or the subsequent cudaMalloc
operation. 如果数据的类型和大小不变,则无需重新运行此步骤或后续的
cudaMalloc
操作。 You can simply call cub::DeviceReduce::Reduce
again (with temp_storage
pointing to the previous allocation provided by cudaMalloc
) on your "new" data, as long as the size and type of data is the same. 您可以简单地在“新”数据上再次调用
cub::DeviceReduce::Reduce
( temp_storage
指向cudaMalloc
提供的先前分配),只要数据的大小和类型相同即可。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.