简体   繁体   中英

Matlab performances with/without loops

I've read on THIS comment on SO that Matlab is no longer slow at for loops (cf link ).

I used to use Matlab quite a lot during my studies and I remember how much time I saved by always finding a solution that not involves excessive loops (by using reshape , repmat or either arrayfun ).

So this article above caught my attention and I quickly wrote this:

clear all; T = linspace(0,1,1e6);
tic
i = 0;
for t = T
    i = i + 1; y(i) = sin(t);
end
toc

clear all; T = linspace(0,1,1e6);
tic
i = 0;
y = zeros(numel(T), 1);
for t = T
    i = i + 1; y(i) = sin(t);
end
toc

clear all; T = linspace(0,1,1e6);
tic
y = sin(T);
toc

which outputs this:

Elapsed time is 1.741640 seconds.
Elapsed time is 1.400412 seconds.
Elapsed time is 0.004076 seconds.

I also tried to toggle the accel feature...

>feature accel on

But each time, even for more complex matrix manipulations, the vectorized version that uses native Matlab functions is always faster.

Perhaps I am missing some important point or I am just still right with my opinion: with Matlab we should always avoid loops as much as possible.

Now, I am looking for a counterexample.

The problem is what different people consider "slow".

When MATLAB for loops go from "unbelievably abysmally slow" to "8 times slower than the vectorized version" there will be

  1. Some people, who exclaim: "Wow, MATLAB is no longer slow on loops!"
  2. Others, that say "MATLAB became better when using loops. Still not good, but bearable."
  3. Some that find "Well, a factor of 8 ist still a hefty slowdown. Vectorize all the way"
  4. finally some that conclude: "C is still faster, even on vectorized code. MATLAB just does not cut it."

In my opinion, MATLAB is still slow at loops (guess I'm group three) and you should vectorize whenever possible (unless the readability suffers). Just because it was even slower in the past, does not make the current performance better.

Also, MATLAB has got some other weak spots: https://stackoverflow.com/a/17933146/1974021

Few examples could be suggested to study for-loop versus vectorization for performance.

Example #1

This is just a very basic computation of calculating sine of a number of elements. This count of elements was varied to assess the problem in hand. Inspired by this screenshot link .

Benchmarking Code

num_runs = 1000;
N_arr = [ 1000 10000 100000 1000000];

%// Warm up tic/toc.
for k = 1:100
    tic(); elapsed = toc();
end

for k = 1:numel(N_arr)
    N = N_arr(k);
    tic
    for runs=1:num_runs
        out_f1 = zeros(1,N);
        for t = 1:N
            out_f1(t) = sin(t);
        end
    end
    t_forloop = toc/num_runs;

    tic
    for runs=1:num_runs
        out_v1 = sin(1:N);
    end
    t_vect = toc/num_runs;
end

Results

----------- Datsize(N) = 1000 -------------
Elapsed time with for-loops -       7.1826e-05
Elapsed time with vectorized code - 8.3601e-05
----------- Datsize(N) = 10000 -------------
Elapsed time with for-loops -       0.00068531
Elapsed time with vectorized code - 0.00045043
----------- Datsize(N) = 100000 -------------
Elapsed time with for-loops -       0.0074613
Elapsed time with vectorized code - 0.0053368
----------- Datsize(N) = 1000000 -------------
Elapsed time with for-loops -       0.077707
Elapsed time with vectorized code - 0.053255

Please note that these results were coherent with timeit results (code and results of those aren't shown here).

Conclusions

  • The results show that you can forget about for-loops as quickly as 10000 elements cases.

Example #2

Let's consider a case of using an array of elements inside each iteration of for-loop. Let it store sine , cosine , tan and sec into one column in each iteration, ie [sin(t) ; cos(t) ; tan(t) ; sec(t)] [sin(t) ; cos(t) ; tan(t) ; sec(t)] [sin(t) ; cos(t) ; tan(t) ; sec(t)] .

For-loop code would be -

out_f1 = zeros(4,N);
for t = 1:N
    out_f1(:,t) = [sin(t) ; cos(t) ; tan(t) ; sec(t)];
end

Vectorized code -

out_v1 = [sin(1:N); cos(1:N) ; tan(1:N); sec(1:N)];

Results

----------- Datsize(N) = 100 -------------
Elapsed time with for-loops - 0.00011861
Elapsed time with vectorized code - 6.0569e-05
----------- Datsize(N) = 1000 -------------
Elapsed time with for-loops - 0.0011867
Elapsed time with vectorized code - 0.00036786
----------- Datsize(N) = 10000 -------------
Elapsed time with for-loops - 0.011819
Elapsed time with vectorized code - 0.0025536
----------- Datsize(N) = 1000000 -------------
Elapsed time with for-loops - 1.2329
Elapsed time with vectorized code - 0.33383

Modified case

One could easily jump into the conclusion that for-loop doesn't stand a chance here. But wait, how about we do element-wise assignment again as in example #1 for for-loop case, like this -

out_f1 = zeros(4,N);
for t = 1:N
    out_f1(1,t) = sin(t);
    out_f1(2,t) = cos(t);
    out_f1(3,t) = tan(t);
    out_f1(4,t) = sec(t);
end

Now, this uses spatial locality, so a competitive vectorized code using the same would be -

out_v1 = [sin(1:N) cos(1:N) tan(1:N) sec(1:N)]';

The benchmark results with these modified codes for this testcase were -

----------- Datsize(N) = 100 -------------
Elapsed time with for-loops - 3.1987e-05
Elapsed time with vectorized code - 6.9778e-05
----------- Datsize(N) = 1000 -------------
Elapsed time with for-loops - 0.00027976
Elapsed time with vectorized code - 0.00036804
----------- Datsize(N) = 10000 -------------
Elapsed time with for-loops - 0.0029712
Elapsed time with vectorized code - 0.0024423
----------- Datsize(N) = 100000 -------------
Elapsed time with for-loops - 0.031113
Elapsed time with vectorized code - 0.028549
----------- Datsize(N) = 1000000 -------------
Elapsed time with for-loops - 0.32636
Elapsed time with vectorized code - 0.28063

Conclusions

The latter benchmark results seem to prove again that for upto 10000 elements for-loop wins and after that vectorized solutions would be preferred. But it must be noted that this came at the expense of writing element-wise assignments.


Final Conclusions

  1. On the argument of deciding which side (for-loop or vectorization) is better, seems like it's far from a black and white picture.

Use a real loop index and the jit-compiler understands your loop:

clear all; T = linspace(0,1,1e6);
tic
y = zeros(numel(T), 1);
for idx=1:numel(T)
    y(idx) = sin(T(idx));
end
toc

Such code is much faster. The optimisations are based on code analyse, write clear code and give matlab a chance to successfully analyse it ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM