简体   繁体   中英

Vectorization of the code

The following is my code. The loop part is quite slow. I wonder is there a way to vecterize the loop part.

N = 1000000;
A = rand(N,3);
B = rand(N,3);
Dist = sqrt(sum((A - B).^2,2));
R = 2;
id = rangesearch(A,A,0.01);
result = zeros(N,1);
for i = 1:N
    idx = id{i}';
    v1 = A(i,:) - A(idx,:);
    v2 = A(i,:) - B(idx,:);
    C = cross(v1,v2,2);
    D = sqrt(sum(C.^2,2))./Dist(idx);
    result(i) = sum(2 * sqrt(R^2 - D.^2));
end

Here, A and B are matrices recording 3D coordinates of N points. First, I want to find neighbors for a point in matrix A, say point Ai, and one of its neighbor is Aj. I want to calculate the distance from Ai to the line Aj-Bj. That is why I calculate the cross product. Finally, I add up all the distances. Right now, this code runs 500 seconds on my computer. So is there a way to make my code running faster or any other way to achieve this goal faster? Thanks.

Your for loop is actually pretty fast.

As mentioned by @EBH in comments, your code should work in the latest 2016 version, but since i'm using earlier 2015 version the implicit expansion is not supported.

original claim: rangesearch(A,A,0.01) does not guarantee that you can get a single neighbor for each point. In fact when I run with N=10 then id is always {1 2 3 4 5 6 7 8 9 10}

Fixed for loop method:

tic
result = zeros(N,1);
for i = 1:N
    idx = id{i}';
    v1 = bsxfun(@minus, A(i,:), A(idx,:));
    v2 = bsxfun(@minus, A(i,:), B(idx,:));
    C = cross(v1,v2,2);
    D = sqrt(sum(C.^2,2))./Dist(idx);
    result(i) = sum(2 * sqrt(R^2 - D.^2));
end
toc

Elapsed time is 0.077025 seconds.

Method 2: Enumerate all possible combinations between v1 and v2

tic
idlen = cellfun(@(x) length(x),id);
idai = cell2mat(arrayfun(@(ii) repmat(ii,1,idlen(ii)), (1:N), 'UniformOutput', false));
idx2 = cell2mat(id');
V1 = A(idai,:) - A(idx2,:);
V2 = A(idai,:) - B(idx2,:);
C2 = cross(V1,V2,2);
d = @(c,id) sqrt(sum(c.^2,2))./Dist(id);
r = @(d) sum(2 * sqrt(R^2 - d.^2));
result3 = splitapply(@(c,id) r(d(c,id)), C2,idx2', idai');
toc
isequal(result,result3)


Elapsed time is 0.471092 seconds.

ans =

     1

The slowest line is splitapply


Method 3: Use cellfun , which does not guarantee vectorization

tic
V1 = cellfun(@(Ai,idx) bsxfun(@minus, Ai, A(idx,:)), num2cell(A,2), id, 'UniformOutput', false);
V2 = cellfun(@(Ai,idx) bsxfun(@minus, Ai, B(idx,:)), num2cell(A,2), id, 'UniformOutput', false);
C = cellfun(@(v1,v2) cross(v1,v2,2), V1,V2, 'UniformOutput', false);
D = cellfun(@(c,idx) sqrt(sum(c.^2,2))./Dist(idx), C, id, 'UniformOutput', false);
result2 = cellfun(@(d) sum(2 * sqrt(R^2 - d.^2)), D, 'UniformOutput', false);
result2 = cell2mat(result2);
toc
isequal(result,result2)

Elapsed time is 0.122700 seconds.

ans =

     1

Method 4: Use parfor

tic
result = zeros(N,1);
parfor i = 1:N
    idx = id{i}';
    v1 = bsxfun(@minus, A(i,:), A(idx,:));
    v2 = bsxfun(@minus, A(i,:), B(idx,:));
    C = cross(v1,v2,2);
    D = sqrt(sum(C.^2,2))./Dist(idx);
    result(i) = sum(2 * sqrt(R^2 - D.^2));
end
toc

Elapsed time is 0.177929 seconds.

parfor complains that A , B and Dist cannot be sliced, and slow down the computing.


Edit: the above test used N=1000 . If I use N=10000 then Method 2 has 40% time reduction, Method 3 consumes same time, Method 4 has ~90% time reduction. So probably you can go for parfor , provided that you are using a multicore computer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM