简体   繁体   中英

Is it possible to speed up this MATLAB script?

I've encountered some performance problems thus I want to speed up those running-slow scripts. But I have no more ideas on how to speed up them. Because I found I was often blocked with the indices. I found the abstract thinking is very difficult for me.

The script is

    tic,
    n = 1000;
    d = 500;
    X = rand(n, d);
    R = rand(n, n);
    F = zeros(d, d);
    for i=1:n
        for j=1:n
           F = F + R(i,j)* ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));
        end
    end
    toc

Discussion & Solution Codes

Few approaches with bsxfun could be suggested here. Also, read on to see how one can get 30x+ speedup on a problem like this!

Approach #1 (Naive vectorized approach)

To accommodate the two operations of subtractions between rows of X and then the subsequent element-wise multiplications between them, a naive bsxfun based approach would lead to a 4D intermediate array which would correspond to ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))) . After that, one needs to multiply R to have the final output F . This is implemented as shown next -

v1 = bsxfun(@minus,X,permute(X,[3 2 1]));
v2 = bsxfun(@times,permute(v1,[1 3 2]),permute(v1,[1 3 4 2]));
F = reshape(R(:).'*reshape(v2,[],d^2),d,[]);

Approach #2 (Not-so-naive vectorized approach)

The earlier mentioned approach goes into 4D which could slow down things. So, instead you can keep the intermediate data until 3D by reshaping. This is listed next -

sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1_2d = reshape(permute(sub1,[1 3 2]),n^2,[])
mult1 = bsxfun(@times,sub1_2d,permute(sub1_2d,[1 3 2]))
F = reshape(R(:).'*reshape(mult1,[],d^2),d,[])

Approach #3 (Hybrid approach)

Now, you can make a hybrid approach based on Approach #2 ( vectorized subtractions + loopy multiplications ). Benefit of this approach would be that it uses the fast matrix multiplication to perform the multiplications and reduces the complexity to O(n) from the earlier O(n^2) and this should make it much more efficient. Thanks to @Dev-iL, for suggesting this idea! Here's the code -

sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));

F = zeros(d);
for k = 1:size(sub1,3)
    blk = sub1(:,:,k);    
    F = F + blk.'*blk;
end

Benchmarking

Benchmarking code comparing the original approach against Approach #3

%// Parameters
n = 500;
d = 250;
X = rand(n, d);
R = rand(n, n);

%// Warm up tic/toc.
for k = 1:100000
    tic(); elapsed = toc();
end

disp('------------------------------ With Original Approach')
tic
F1 = zeros(d, d);
for i=1:n
    for j=1:n
        F1 = F1 + R(i,j)*((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));
    end
end
toc, clear F1 i j

disp('------------------------------ With Proposed Approach #3')
tic
sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));

F = zeros(d);
for k = 1:size(sub1,3)
    blk = sub1(:,:,k);    
    F = F + blk.'*blk;
end
toc

Runtime results

------------------------------ With Original Approach
Elapsed time is 29.728571 seconds.
------------------------------ With Proposed Approach #3
Elapsed time is 0.839726 seconds.

So, who's ready for a 30x+ speedup!?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM