简体   繁体   English

矩阵矢量乘法的更快实现[并行计算]

[英]Faster Implementation of Matrix-Vector Multiplication [Parallel Computation]

I have this code that performs a matrix-vector multiplication written in php. 我有这段代码执行用php编写的矩阵向量乘法。

Here's a snippet: 这是一个片段:

for($i = 0; $i < sizeof($transposed_matrix); $i++) {
            $vector[$i] = 0;
            for($j = 0; $j < sizeof($new_vector); $j++) {
                $vector[$i] += ($transposed_matrix[$i][$j] * $new_vector[$j]);
            }
        }

I want to know if there is any way to make this code run faster? 我想知道是否有任何方法可以使此代码运行更快?

one optimization would be to count before for : 一个优化将是之前数for

$size = sizeof($transposed_matrix);
$size2 = sizeof($new_vector);
for($i = 0; $i < $size; $i++) {
    $vector[$i] = 0;        
    for($j = 0; $j < $size2; $j++) {
        $vector[$i] += ($transposed_matrix[$i][$j] * $new_vector[$j]);
    }
}

PHP does not give you enough control for some serious optimization. PHP没有提供足够的控制权来进行一些认真的优化。 The proposed improvements will probably have relatively small impact, unless you are multiplying huge matrices and vectors (in which case you should not be using PHP in the first place). 除非您要乘以巨大的矩阵和向量(在这种情况下,您首先不应该使用PHP),否则建议的改进可能只会产生相对较小的影响。

In addition to precomputing the sizes and using pre-increments for the counters (as suggested by Tjoene), use a temporary variable for the sum in the inner loop like so: 除了预先计算大小并为计数器使用预增量(如Tjoene所建议的)外,还应使用一个临时变量作为内循环中的总和,如下所示:

$sum = 0;
for ($j = 0; $j < $numCols; ++$j) {
    $sum += $matrix[$i][$j] * $vector[$j];
}
$vector[$i] = $sum;

This will avoid computing the correct destination location in $vector multiple times. 这样可以避免多次计算$ vector中的正确目标位置。

Probably the biggest performance gain can be achieved by storing the matrix data in a single flat array instead of the nested structure you use. 通过将矩阵数据存储在单个平面阵列中而不是您使用的嵌套结构中,可以最大程度地提高性能。 Simply concatenate the rows of the matrix and you can run through its elements using a single index like so: 只需连接矩阵的行,就可以使用单个索引遍历其元素,如下所示:

for ($i = 0, $n = 0; $i < $numRows; ++$i, ++$n)
{
    $sum = 0;
    for ($j = 0; $j < $numCols; ++$j) {
        $sum += $matrix[$n] * $vector[$j];
    }
    $vector[$i] = $sum;
}

This will, of course, only speed things up if you don't have to convert to this matrix layout prior to the actual multiplication. 当然,如果您不必在实际乘法之前转换为这种矩阵布局,这只会加快处理速度。

If you don't want to change the matrix layout, you could speed things up by using foreach in the outer loop to retrieve the matrix's rows. 如果您不想更改矩阵布局,则可以通过在外部循环中使用foreach来检索矩阵的行来加快处理速度。 Note, however, that this iterates over the set of rows in the order in which these row arrays were added to the matrix! 但是请注意,这会按照将这些行数组添加到矩阵的顺序来遍历所有行! If this order differs between the matrix and the vector, the result will be all wrong. 如果矩阵和向量之间的顺序不同,则结果将全部错误。 So, probably not such a reliable thing to do as it breaks far to easily ... 因此,可能做起来不那么可靠,因为它容易折断...

Oh, an you can always try to partially unroll the loop(s). 哦,您可以随时尝试部分展开循环。

PHP arrays have a tendency to be slow, this is a tribute to the hashing mechanism. PHP数组有变慢的趋势,这是对哈希机制的致敬。 PHP array performance . PHP数组性能 If you have a way to predetermine the size of the vectors, you could unroll the loops and avoid using Arrays. 如果可以预先确定向量的大小,则可以展开循环并避免使用数组。 If your code is the entire code, this won't help you though, as every item in $transposed_matrix is hit only once, and you can reduce the number of hits on $vector by using the $sum technique as outlined by Atze Kaputnik. 如果您的代码是完整的代码,但这将无济于事,因为$transposed_matrix每个项目仅被命中一次,并且可以使用Atze Kaputnik概述的$sum技术减少$vector的命中次数。 So you'll end up copying stuff from the array parameters to local variables, then calculate and then copy back... that'll more than kill the performance gain. 因此,您最终将把东西从数组参数复制到局部变量,然后进行计算,然后再复制回去……这不但会降低性能。

In the end, all you can do is switch to an entirely different method of optimization: JIT-compilers like HipHop or compiled languages. 最后,您所能做的就是切换到一种完全不同的优化方法: HipHop或编译语言之类的JIT编译器。 The same loop in C is likely to run in the order of 10 to 100 times faster, minus the time to fork that process. C中的同一循环可能以10到100倍的速度运行,减去分叉该过程的时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM