简体   繁体   English

使用Cuda测试多个阵列的组合

[英]Testing combinations of multiple arrays with Cuda

I have the below code written in php and have been reading up on Cuda to utilize the GPU processing power of my old Geforce 8800 Ultra. 我有以下用php编写的代码,并已在Cuda上阅读以利用我的旧Geforce 8800 Ultra的GPU处理能力。 How do I convert this nested combinations test to Cuda parallel processing code (if even possible...)? 如何将此嵌套组合测试转换为Cuda并行处理代码(即使可能的话...)? The total combinations of the 2d arrays: $a, $b, $c, $d, $e quickly rise into the trillions... 二维数组的总组合:$ a,$ b,$ c,$ d,$ e迅速上升到万亿……

foreach($a as $aVal){
    foreach($b as $bVal){
        foreach($c as $cVal){
            foreach($d as $dVal){
                foreach($e as $eVal){

                    $addSum = $aVal[0]+$bVal[0]+$cVal[0]+$dVal[0]+$eVal[0];
                    $capSum = $aVal[1]+$bVal[1]+$cVal[1]+$dVal[1]+$eVal[1];
                    if($capSum <= CAP_LIMIT){
                        $tempArr = array("a" => $aVal[2],"b" => $aVal[2],"c" => $aVal[2],
                        "d" => $aVal[2],"e" => $aVal[2],"addTotal" => $addSum,"capTotal" => $capSum);

                        array_push($topCombinations, $tempArr);

                        if(count($topCombinations) > 1000){
                           $topCombinations = $ca->arraySortedDescend($topCombinations);
                           array_splice($topCombinations, 900);

                        }
                    }  
                }
            }
        }
    }
}

This is a very wide-open question. 这是一个非常开放的问题。 It requires conversion between languages as well as designing a parallel algorithm. 它需要语言之间的转换以及设计并行算法。 I won't go into too much detail, but in a nutshell: 我不会赘述过多,但总而言之:

How you parallelize it depends on the size of your arrays ($a - $e). 如何并行化取决于数组的大小($ a-$ e)。 If they are large enough, you could parallelize only the outer one or two loops across threads in a grid, and do the inner loops sequentially. 如果它们足够大,则只能跨网格中的线程并行化一个或两个外部循环,并依次执行内部循环。 If they are not super large, you might want to either flatten 2-3 of the outer loops or possibly implement them using 2D or 3D thread blocks and grids in CUDA. 如果它们不是很大,则可能需要展平2-3个外部循环,或者可能使用CUDA中的2D或3D线程块和网格来实现它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM