Matlab scalar operation takes longer than array operation

Question

In using the profiler to speed up code, I noticed that scalar operations on a single array element were taking longer than vectorized operations on the entire array. Obviously, this is not what one would expect, since there is only one operation taking place when working with an array element, but many operations (albeit vectorized) when operating vectorizedly on an array.

The context in which I saw this was a bit complicated, with the scalar operation not being done on the same nested object as the array. However, I was able to replicate this oddity with a script:

%%%%%%%%%%%%%
%%  tst1.m
%%%%%%%%%%%%%

% Generate random data
for ix=1:5; for iy=1:5
   x(ix).y(iy).z=rand(1,10);
end; end
Ntest=1e7;

disp('Script tst#1a: Operation on one array element:')
tic
for i=1:Ntest
   a=0.5>x(3).y(3).z(1);
end % for i
toc

% clear a

disp('Script tst#1b: Vectorized operation on entire array:')
tic
for i=1:Ntest
   a=0.5>x(3).y(3).z;
end % for i
toc

Script tst#1a above is the single array element operation while script tst#1b is the vectorized operation on the entire array. The results are:

Script tst#1a: Operation on one array element:
Elapsed time is 6.260495 seconds.
Script tst#1b: Vectorized operation on entire array:
Elapsed time is 4.491822 seconds.

As can be seen the scalar operation takes significantly longer. Would anyone be able to surmise the reason for this counterintuitive observation? Perhaps something really silly in the test code?

In assembling the above test, I also found that if I cleared the left-hand-side variable, such as in the statement commented out above, the scalar operation sped up by almost a factor of 2. I don't know exactly why, but regardless of the reason, I found it even odder that the scalar operation test code sped up even though the clearing occurs after the scalar operation test code. Here is the same m-file with the clear command uncommented:

%%%%%%%%%%%%%
%%  tst2.m
%%%%%%%%%%%%%

% Generate random data
for ix=1:5; for iy=1:5
   x(ix).y(iy).z=rand(1,10);
end; end
Ntest=1e7;

disp('Script tst#2a: Operation on one array element:')
tic
for i=1:Ntest
   a=0.5>x(3).y(3).z(1);
end % for i
toc

disp('Clearing a');
clear a

disp('Script tst#2b: Vectorized operation on entire array:')
tic
for i=1:Ntest
   a=0.5>x(3).y(3).z;
end % for i
toc

Here is the result, showing the inexplicable speedup of the preceding scalar operation test code (in comparison to the results for tst1.m):

Script tst#2a: Operation on one array element:
Elapsed time is 3.371326 seconds.
Clearing a
Script tst#2b: Vectorized operation on entire array:
Elapsed time is 4.463924 seconds.

Neither of these tests are completely reflective of my situation, which uses class methods instead of scripts. I recall reading on a forum that, compared to scripts, functions and methods provide more opportunities for compiler optimizations. In order to get a clue as to whether the this might explain the relative slowness of scalar operations and the counterintuitive speedup due to a post hoc clear, I put the above two test scripts into class methods:

%%%%%%%%%%%%%%
%%  cTest.m
%%%%%%%%%%%%%%
classdef cTest < handle
methods

   function tst1(o)

      % Generate random data
      for ix=1:5; for iy=1:5
         x(ix).y(iy).z=rand(1,10);
      end; end
      Ntest=1e7;

      disp('Method tst#1a: Operation on one array element:')
      tic
      for i=1:Ntest
         a=0.5>x(3).y(3).z(1);
      end % for i
      toc

      % clear a
      disp('Method tst#1b: Vectorized operation on entire array:')
      tic
      for i=1:Ntest
         a=0.5>x(3).y(3).z;
      end % for i
      toc

   end % function tst1

   function tst2(o)

      % Generate random data
      for ix=1:5; for iy=1:5
         x(ix).y(iy).z=rand(1,10);
      end; end
      Ntest=1e7;

      disp('Method tst#2a: Operation on one array element:')
      tic
      for i=1:Ntest
         a=0.5>x(3).y(3).z(1);
      end % for i
      toc

      disp('Clearing a');
      clear a

      disp('Method tst#2b: Vectorized operation on entire array:')
      tic
      for i=1:Ntest
         a=0.5>x(3).y(3).z;
      end % for i
      toc

   end % function tst2

end % method
end % classdef

I compare the execution of all the above m-files using the following "testbench" script:

%%%%%%%%%%%
%%  go.m
%%%%%%%%%%%
clc
c = cTest;

tst1
disp(' ')
tst2

fprintf('\n\n')

c.tst1
disp(' ')
c.tst2

The combined results are:

Script tst#1a: Operation on one array element:
Elapsed time is 5.888381 seconds.
Script tst#1b: Vectorized operation on entire array:
Elapsed time is 4.636491 seconds.

Script tst#2a: Operation on one array element:
Elapsed time is 3.435526 seconds.
Clearing a
Script tst#2b: Vectorized operation on entire array:
Elapsed time is 4.531256 seconds.


Method tst#1a: Operation on one array element:
Elapsed time is 5.732293 seconds.
Method tst#1b: Vectorized operation on entire array:
Elapsed time is 4.550085 seconds.

Method tst#2a: Operation on one array element:
Elapsed time is 3.266772 seconds.
Clearing a
Method tst#2b: Vectorized operation on entire array:
Elapsed time is 4.664736 seconds.

Out of the 4 blocks of output text, the 1st 2 blocks are a re-run of the 2 sript tests above, while the last 2 blocks of output execute the same code, but as class methods. The results are similar, so inexplicable slowness of a scalar operation, and the counterintuitive speedup due to a post hoc clear command, do not seem to be affected by compilation difference between scripts and class methods.

In summary,

The scalar operation on an array element seems to inexplicably run slower than an array operation. Perhaps there is some kind of speed penalty associated with extracting a single element from an array, of which I am unaware.
A post-hoc clear inexplicably speeds up the scalar operation so that it is faster than the array operation. This is what one would expect regardless of the presence of the clear command.
These observations do not seem to be affected by any compilation differences between scripts and class methods.

If anyone can shed some light on the inner workings that might lead to the above observations, perhaps I use that insight to get rid of the slowness of scalar operations on individual array elements in my class methods.

AFTERNOTE: Observation#1 is seen even without deeply nesting an array in layers of structure arrays:

>> clear all; x=rand(1,10); tic; for i=1:1e7; a=0.5>x(1); end; toc
   Elapsed time is 0.092028 seconds.

>> clear all; x=rand(1,10); tic; for i=1:1e7; a=0.5>x; end; toc
   Elapsed time is 1.344769 seconds.

This is using MATLAB Version 8.5.0.197613 (R2015a) on 3Ghz laptop running 64-bit Windows 7 with 8GB RAM and not much else running to consume the memory. Matlab is using 550GB and Internet Explorer is using 240GB.

Answer 1

Alexander Kemp's answer seems to be the likely explanation, based on the info to date. Indexing into an array to access individual elements seems to come with a significant time overhead. It is probably not the scalar operation per se that takes longer than a vectorized operation; it is the extraction of an element from an array for the scalar operation that causes the speed penalty.

Answer 2

Not sure which is the real cause, but there are three things which I would investigate:

Adressing costs time
the indexing may add complexity so that loop optimization does not work any more - this is something I observed some time ago for other expessions in loops, with sudden drop in speed after some seeminly innocent change.

Edited: JIT -> loop optimization

Matlab scalar operation takes longer than array operation

Question

2 answers

solution1
0 2016-01-31 01:29:04

solution2
-1 2016-01-16 16:08:00

Matlab scalar operation takes longer than array operation

Question

2 answers

solution1 0 2016-01-31 01:29:04

solution2 -1 2016-01-16 16:08:00

solution1
0 2016-01-31 01:29:04

solution2
-1 2016-01-16 16:08:00