I've encountered some performance problems thus I want to speed up those running-slow scripts. But I have no more ideas on how to speed up them. Because I found I was often blocked with the indices. I found the abstract thinking is very difficult for me.
The script is
tic,
n = 1000;
d = 500;
X = rand(n, d);
R = rand(n, n);
F = zeros(d, d);
for i=1:n
for j=1:n
F = F + R(i,j)* ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));
end
end
toc
Few approaches with bsxfun
could be suggested here. Also, read on to see how one can get 30x+
speedup on a problem like this!
Approach #1 (Naive vectorized approach)
To accommodate the two operations of subtractions between rows of X
and then the subsequent element-wise multiplications between them, a naive bsxfun
based approach would lead to a 4D intermediate array which would correspond to ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)))
. After that, one needs to multiply R
to have the final output F
. This is implemented as shown next -
v1 = bsxfun(@minus,X,permute(X,[3 2 1]));
v2 = bsxfun(@times,permute(v1,[1 3 2]),permute(v1,[1 3 4 2]));
F = reshape(R(:).'*reshape(v2,[],d^2),d,[]);
Approach #2 (Not-so-naive vectorized approach)
The earlier mentioned approach goes into 4D which could slow down things. So, instead you can keep the intermediate data until 3D by reshaping. This is listed next -
sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1_2d = reshape(permute(sub1,[1 3 2]),n^2,[])
mult1 = bsxfun(@times,sub1_2d,permute(sub1_2d,[1 3 2]))
F = reshape(R(:).'*reshape(mult1,[],d^2),d,[])
Approach #3 (Hybrid approach)
Now, you can make a hybrid approach based on Approach #2 ( vectorized subtractions
+ loopy multiplications
). Benefit of this approach would be that it uses the fast matrix multiplication
to perform the multiplications and reduces the complexity to O(n) from the earlier O(n^2) and this should make it much more efficient. Thanks to @Dev-iL, for suggesting this idea! Here's the code -
sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));
F = zeros(d);
for k = 1:size(sub1,3)
blk = sub1(:,:,k);
F = F + blk.'*blk;
end
Benchmarking code comparing the original approach against Approach #3
%// Parameters
n = 500;
d = 250;
X = rand(n, d);
R = rand(n, n);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------------ With Original Approach')
tic
F1 = zeros(d, d);
for i=1:n
for j=1:n
F1 = F1 + R(i,j)*((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));
end
end
toc, clear F1 i j
disp('------------------------------ With Proposed Approach #3')
tic
sub1 = bsxfun(@minus,X,permute(X,[3 2 1]));
sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));
F = zeros(d);
for k = 1:size(sub1,3)
blk = sub1(:,:,k);
F = F + blk.'*blk;
end
toc
Runtime results
------------------------------ With Original Approach
Elapsed time is 29.728571 seconds.
------------------------------ With Proposed Approach #3
Elapsed time is 0.839726 seconds.
So, who's ready for a 30x+ speedup!?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.