使用推力::变换替换for循环

Question

I am trying to optimize my code by implementing for loops on threads of the GPU. 我试图通过在GPU线程上实现for循环来优化我的代码。 I am trying to eliminate two for loops using thrust::transform. 我正在尝试使用推力::变换消除两个for循环。 The code in C++ looks like: C ++中的代码如下所示：

    ka_index = 0;
    for (int i = 0; i < N_gene; i++)
    {
        for (int j = 0; j < n_ka_d[i]; j++ )
        {
            co0 = get_coeff0(ka_vec_d[ka_index]);
            act[i] += (co0*ka_val_d[ka_index]); 
            ka_index++;
        }
        act[i] = pow(act[i],n); 
    }

I am estimating co-efficients for an ordinary differential equation(ODE) in the above loops and have transferred all the data onto the device using thrust. 我正在上述循环中估算一个常微分方程（ODE）的系数，并已使用推力将所有数据传输到设备上。 Consider the case where the number of genes is represented by N_gene. 考虑以N_gene表示基因数目的情况。 The fist for loop has to run N_gene number of times. 第一个for循环必须运行N_gene次。 The second for loop is restricted by the number of activators(other friendly genes in the gene pool) of each gene. 第二个for循环受每个基因的激活剂（基因库中其他友好基因）的数量限制。 Each gene has a number of activators(friendly genes whose presence increases the concentration of gene i) represented by elements of n_ka vector. 每个基因都有许多由n_ka载体代表的激活因子（其存在会增加基因i浓度的友好基因）。 Value of n_ka[i] can vary from 0 to N_gene - 1. ka_val represents the measure of activation for each activator ka. n_ka [i]的值可以在0到N_gene-1之间变化。ka_val表示每个激活因子ka的激活度量。 ka_vec_d has the gene index which activates gene i. ka_vec_d具有激活基因i的基因索引。

I am trying to represent these loops using iterators, but unable to do so. 我试图使用迭代器来表示这些循环，但无法这样做。 I am familiar with using thrust::for_each(thrust::make_zip_iterator(thrust::make_tuple)) for a single for loop, but having a tough time coming up with a way to implement two for loops using counting_iterator or transform iterators. 我熟悉对单个for循环使用推力::: for_each（thrust :: make_zip_iterator（thrust :: make_tuple）），但是很难找到一种使用counting_iterator或转换迭代器实现两个for循环的方法。 Any pointers or help to convert these two for loops will be appreciated. 任何指针或帮助转换这两个for循环将不胜感激。 Thanks for your time! 谢谢你的时间！

Answer 1

This looks like a reduce problem. 这看起来像一个减少问题。 I think you can use thrust::transform with zip iterators and thrust::reduce_by_key . 我想你可以使用thrust::transform拉链迭代器和thrust::reduce_by_key 。 A sketch of this solution is: 此解决方案的草图是：

// generate indices
std::vector< int > hindices;
for( size_t i=0 ; i<N_gene ; ++i )
    for( size_t j=0 ; j<n_ka_d[i] ; ++j )
     hindices.push_back( i );
thrust::device_vector< int > indices = hindices;

// generate tmp
// trafo1 implements get_coeff0( get< 0 >( t ) ) * get< 1 >( t);
thrust::device_vector< double > tmp( N );
thrust::transform(
    thrust::make_zip_iterator(
        thrust::make_tuple( ka_vec_d.begin() , ka_val_d.begin() ) ) ,
    thrust::make_zip_iterator(
        thrust::make_tuple( ka_vec_d.end() , ka_val_d.end() ) ) ,
    tmp.begin() , trafo1 );

// do the reduction for each ac[i]
thrust::device_vector< int > indices_out( N );
thrust::reduce_by_key( indices.begin() , indices.end() , tmp.begin() ,
    ac.begin() , indices_out.begin() );

// do the pow transformation
thrust::transform( ac.begin() , ac.end() , ac.begin() , pow_trafo );

I this this can also be optimized by transform_iterators to reduce the number of calls of thrust::transform and thrust::recuce_by_key . 我也可以通过transform_iterators来优化此方法，以减少对thrust::recuce_by_key thrust::transform和thrust::recuce_by_key的调用thrust::recuce_by_key 。

使用推力::变换替换for循环

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-04-06 11:38:01

使用推力::变换替换for循环

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-04-06 11:38:01

解决方案1
1 已采纳 2013-04-06 11:38:01