简体   繁体   English

是否可以加速这个MATLAB脚本?

[英]Is it possible to speed up this MATLAB script?

I've encountered some performance problems thus I want to speed up those running-slow scripts. 我遇到了一些性能问题,因此我想加快那些运行缓慢的脚本。 But I have no more ideas on how to speed up them. 但我对如何加快它们没有更多的想法。 Because I found I was often blocked with the indices. 因为我发现我经常被指数所阻挡。 I found the abstract thinking is very difficult for me. 我发现抽象思维对我来说非常困难。

The script is 脚本是

    tic,
    n = 1000;
    d = 500;
    X = rand(n, d);
    R = rand(n, n);
    F = zeros(d, d);
    for i=1:n
        for j=1:n
           F = F + R(i,j)* ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));
        end
    end
    toc

Discussion & Solution Codes 讨论和解决方案代码

Few approaches with bsxfun could be suggested here. 这里可以建议使用bsxfun方法很少。 Also, read on to see how one can get 30x+ speedup on a problem like this! 另外,继续阅读以了解如何在这样的问题上获得30x+加速!

Approach #1 (Naive vectorized approach) 方法#1(天真矢量化方法)

To accommodate the two operations of subtractions between rows of X and then the subsequent element-wise multiplications between them, a naive bsxfun based approach would lead to a 4D intermediate array which would correspond to ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))) . 为了适应X行之间的两次减法操作,然后是它们之间的后续逐元素乘法,一个朴素的基于bsxfun的方法将导致一个4D中间数组,它对应于((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))) bsxfun ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))) After that, one needs to multiply R to have the final output F . 之后,需要将R乘以得到最终输出F This is implemented as shown next - 这是如下所示实现的 -

v1 = bsxfun(@minus,X,permute(X,[3 2 1]));
v2 = bsxfun(@times,permute(v1,[1 3 2]),permute(v1,[1 3 4 2]));
F = reshape(R(:).'*reshape(v2,[],d^2),d,[]);

Approach #2 (Not-so-naive vectorized approach) 方法#2(不那么天真的矢量化方法)

The earlier mentioned approach goes into 4D which could slow down things. 前面提到的方法进入4D可能会减慢速度。 So, instead you can keep the intermediate data until 3D by reshaping. 因此,您可以通过重新整形将中间数据保留到3D。 This is listed next - 这是下一个 -

sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1_2d = reshape(permute(sub1,[1 3 2]),n^2,[])
mult1 = bsxfun(@times,sub1_2d,permute(sub1_2d,[1 3 2]))
F = reshape(R(:).'*reshape(mult1,[],d^2),d,[])

Approach #3 (Hybrid approach) 方法#3(混合方法)

Now, you can make a hybrid approach based on Approach #2 ( vectorized subtractions + loopy multiplications ). 现在,您可以基于方法#2vectorized subtractions + loopy multiplications )制作混合方法。 Benefit of this approach would be that it uses the fast matrix multiplication to perform the multiplications and reduces the complexity to O(n) from the earlier O(n^2) and this should make it much more efficient. 这种方法的好处在于它使用fast matrix multiplication来执行乘法,并将复杂度从较早的O(n ^ 2)降低到O(n),这应该使其更有效。 Thanks to @Dev-iL, for suggesting this idea! 感谢@ Dev-iL,提出这个想法! Here's the code - 这是代码 -

sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));

F = zeros(d);
for k = 1:size(sub1,3)
    blk = sub1(:,:,k);    
    F = F + blk.'*blk;
end

Benchmarking 标杆

Benchmarking code comparing the original approach against Approach #3 比较原始方法与方法#3的基准代码

%// Parameters
n = 500;
d = 250;
X = rand(n, d);
R = rand(n, n);

%// Warm up tic/toc.
for k = 1:100000
    tic(); elapsed = toc();
end

disp('------------------------------ With Original Approach')
tic
F1 = zeros(d, d);
for i=1:n
    for j=1:n
        F1 = F1 + R(i,j)*((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));
    end
end
toc, clear F1 i j

disp('------------------------------ With Proposed Approach #3')
tic
sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));

F = zeros(d);
for k = 1:size(sub1,3)
    blk = sub1(:,:,k);    
    F = F + blk.'*blk;
end
toc

Runtime results 运行时结果

------------------------------ With Original Approach
Elapsed time is 29.728571 seconds.
------------------------------ With Proposed Approach #3
Elapsed time is 0.839726 seconds.

So, who's ready for a 30x+ speedup!? 那么,谁准备好了30倍以上的加速!?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM