简体   繁体   English

使用减少的CUDA查找数组中的最小值(但跳过某些元素)

[英]Finding the minimum in an array (but skipping some elements) using reduction in CUDA

I have a large array of floating point numbers and I want to find out the minimum value of the array (ignoring -1 s wherever present) as well as its index, using reduction in CUDA. 我有一个很大的浮点数数组,我想通过减少CUDA来找出该数组的最小值(忽略-1无论存在什么地方)及其索引。 I have written the following code to do this, which in my opinion should work: 我编写了以下代码来执行此操作,我认为这应该起作用:

 __global__ void get_min_cost(float *d_Cost,int n,int *last_block_number,int *number_in_last_block,int *d_index){
     int tid = threadIdx.x;
     int myid = blockDim.x * blockIdx.x + threadIdx.x;
     int s;

     if(result == (*last_block_number)-1){
         s = (*number_in_last_block)/2;
     }else{
         s = 1024/2;
     }

     for(;s>0;s/=2){
         if(myid+s>=n)
             continue;
         if(tid<s){
             if(d_Cost[myid+s] == -1){
                 continue;
             }else if(d_Cost[myid] == -1 && d_Cost[myid+s] != -1){
                 d_Cost[myid] = d_Cost[myid+s];
                 d_index[myid] = d_index[myid+s];
             }else{
                 // both not -1
                 if(d_Cost[myid]<=d_Cost[myid+s])
                     continue;
                 else{
                     d_Cost[myid] = d_Cost[myid+s];
                     d_index[myid] = d_index[myid+s];
                 }
             }
         }
         else
             continue;
         __syncthreads();
     }
     if(tid==0){
         d_Cost[blockIdx.x] = d_Cost[myid];
         d_index[blockIdx.x] = d_index[myid];
     }
     return;
 }

The last_block_number argument is the id of the last block, and number_in_last_block is the number of elements in last block (which is a power of 2 ). last_block_number参数是最后一个块的ID, number_in_last_block是最后一个块中的元素数( 2的幂)。 Thus, all blocks will launch 1024 threads every time and the last block will only use number_in_last_block threads, while others will use 1024 threads. 因此,所有块每次都会启动1024线程,最后一个块仅使用number_in_last_block线程,而其他块将使用1024线程。

After this function runs, I expect the minimum values for each block to be in d_Cost[blockIdx.x] and their indices in d_index[blockIdx.x] . 运行此函数后,我希望每个块的最小值在d_Cost[blockIdx.x]而它们的索引在d_index[blockIdx.x]

I call this function multiple times, each time updating the number of threads and blocks. 我多次调用此函数,每次更新线程和块的数量。 The second time I call this function, the number of threads now become equal to the number of blocks remaining etc. 我第二次调用此函数时,线程数现在等于剩余的块数,等等。

However, the above function isn't giving me the desired output. 但是,上述功能没有给我想要的输出。 In fact, it gives a different output every time I run the program, ie, it returns an incorrect value as the minimum during some intermediate iteration (though that incorrect value is quite close to the minimum every time). 实际上,每次我运行程序时,它都会给出不同的输出,即,它在某个中间迭代期间返回一个不正确的值作为最小值(尽管该不正确的值每次都非常接近最小值)。

What am I doing wrong here? 我在这里做错了什么?

As I mentioned in my comment above, I would recommend to avoid writing reductions of your own and use CUDA Thrust whenever possible. 正如我在上面的评论中提到的那样,我建议避免自己编写缩减文件,并尽可能使用CUDA Thrust。 This holds true even in the case when you need to customize those operations, the customization being possible by properly overloading, eg, relational operations. 即使在需要自定义这些操作的情况下也是如此,通过适当的重载(例如关系操作)可以进行自定义。

Below I'm providing a simple code to evaluate the minimum in an array along with its index. 下面,我提供了一个简单的代码来评估数组中的最小值及其索引。 It is based on a classical example contained in the An Introduction to Thrust presentation. 它基于“ 推力简介”演示文稿中包含的经典示例。 The only addition is skipping, as you requested, the -1 's from the counting. 根据您的要求,唯一的增加是从计数中跳过-1 This can be reasonably done by replacing all the -1 's in the array by INT_MAX , ie, the maximum representable integer according to IEEE floating point standards. 可以通过用INT_MAX替换数组中的所有-1 (即根据IEEE浮点标准的最大可表示整数)来合理地做到这一点。

#include <thrust\device_vector.h>
#include <thrust\replace.h>
#include <thrust\sequence.h>
#include <thrust\reduce.h>
#include <thrust\iterator\zip_iterator.h>
#include <thrust\tuple.h>

// --- Struct returning the smallest of two tuples
struct smaller_tuple
{
    __host__ __device__ thrust::tuple<int,int> operator()(thrust::tuple<int,int> a, thrust::tuple<int,int> b)
    {
        if (a < b)
            return a;
        else
            return b;
    }
};


void main() {

    const int N = 20;
    const int large_value = INT_MAX;

    // --- Setting the data vector
    thrust::device_vector<int> d_vec(N,10);
    d_vec[3] = -1; d_vec[5] = -2;

    // --- Copying the data vector to a new vector where the -1's are changed to FLT_MAX
    thrust::device_vector<int> d_vec_temp(d_vec);
    thrust::replace(d_vec_temp.begin(), d_vec_temp.end(), -1, large_value);

    // --- Creating the index sequence [0, 1, 2, ... )
    thrust::device_vector<int> indices(d_vec_temp.size());
    thrust::sequence(indices.begin(), indices.end());

    // --- Setting the initial value of the search
    thrust::tuple<int,int> init(d_vec_temp[0],0);

    thrust::tuple<int,int> smallest;
    smallest = thrust::reduce(thrust::make_zip_iterator(thrust::make_tuple(d_vec_temp.begin(), indices.begin())),
                          thrust::make_zip_iterator(thrust::make_tuple(d_vec_temp.end(), indices.end())),
                          init, smaller_tuple());

    printf("Smallest %i %i\n",thrust::get<0>(smallest),thrust::get<1>(smallest));
    getchar();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM