简体   繁体   中英

Why does MATLAB's element-wise exponentiation speed up for 512 elements?

MATLAB's power function to calculate element-wise exponential for a constant base and an array of exponents becomes noticeably faster when the size of the array becomes 512. I expected to see the computation time increase with the input size, however, there is a noticeable drop when there are 512 elements in the array of exponents. Here is a sample code

x_list = 510:514;
for i = 1:numel(x_list)
    x = x_list(i);
    tic
    for j = 1:10000
        y = power(2,1:x); 
    end
    toc
end

The output of the code is

Elapsed time is 0.397649 seconds.
Elapsed time is 0.403687 seconds.
Elapsed time is 0.318293 seconds.
Elapsed time is 0.238875 seconds.
Elapsed time is 0.175525 seconds.

What is happening here?

图片

I see the same effect using random numbers for the exponent, as I see using integers in the range 1:n :

x = 500:540;
t = zeros(size(x));
for ii = 1:numel(x)
    %m = 1:x(ii);
    m = 500*rand(1,x(ii));
    t(ii) = timeit(@()power(2,m));
end
plot(x,t)

图表显示在512左右的执行时间内跳转

When forcing MATLAB to use a single thread with maxNumCompThreads(1) , and running the code above again, I see this graph instead (note the y-axis, the peaks are just noise):

图表没有显示512的跳转

It looks to me that MATLAB uses a single core to compute the exponent of 511 values, and fires up all cores if the matrix is larger. There is an overhead in using multithreading, it is not worth while to do so for small arrays. The exact point where the overhead is balanced by the time savings depends on many factors, and so hard-coding a fixed threshold for when to switch to multithreaded computation leads to a jump in execution time on systems with different characteristics to those of the system where the threshold was determined.

Note that @norok2 is not seeing this same jump because on their system MATLAB was limited to a single thread .

This is related to size of the number for which the power is computed rather than the size of the container.

If you use random numbers, for varying container size, one does not observe a jump in the timings:

x = 450:1550;
y = zeros(numel(x), 1);
X = rand(1, 10000);
for i = 1:length(x)
    f = @() 2 .^ X(1:x(i));
    y(i) = timeit(f);
end

figure()
plot(x, y)

power_test_1

Therefore the issue must be with the computation for very large numbers. I first I thought that this might be related to overflow, but the overflow happens at 2 ^ 1024 == inf as dictated by the IEEE standards which MATLAB follows, and I thought that for inf this is would have been much faster than computing a number for real.

This is supported by the following benchmark where the size of the array is kept constant:

x = 450:1550;
y = zeros(numel(x), 1);
X = rand(1, 10000);
for i = 1:length(x)
    f = @() 2 .^ (ones(1, 500) * x(i));
    y(i) = timeit(f);
end

figure()
plot(x, y)

power_test_2

Why exactly this may be relevant for your setup when 2 ^ 512 instead of 2 ^ 1024 , I do not really understand.

(Note that I used 2 .^ ... instead of power(2, ...) but the results are the same.)


Also, running @CrisLuengo's code in my system does not really reproduce any jump.

x = 500:540;
t = zeros(size(x));
for ii = 1:numel(x)
    %m = 1:x(ii);
    m = 500*rand(1,x(ii));
    t(ii) = timeit(@()power(2,m));
end
plot(x,t)

all the evidence so far indicate that spike being a related to JIT latency/warm-up.

在此输入图像描述

Here's some confirmation of what Cris found , using a 4-core Windows machine running MATLAB R2018a. I first tested the following code to show that the specific value of the exponent wasn't the culprit for the jump:

t = zeros(4, 1000);
for p = 1:size(t, 1)
  for n = 1:size(t, 2)
    t(p, n) = timeit(@() power(2, (2.^(p-1)).*ones(1, n)));
  end
end

And here are the results:

在此输入图像描述

For degenerate edge cases where the exponent is 1 (return the same value) or 2 (return the value times itself) the computation runs faster, as expected. However, the jump at an array size of 512 or above indicates overhead is added to these edge cases, compared to a reduction in computation time for exponents of 4 and 8 when the array size exceeds 512. Larger exponent values simply reproduce the upper curves.

I then ran two more tests: one with array size between 1 and 511, and a second with array size between 512 and 1024. Here's what the processor load looked like:

在此输入图像描述

Processor 3 shows a large load spike during the first test, while all 4 processors show load spikes during the second test. This confirms that multithreading is employed for array sizes of 512 or above. This also explains the slower computation for edge cases of larger sizes, since the overhead from multithreading outweighs the speedup provided by splitting up the simpler calculations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM