Why is Octave's function call overhead so much larger than both Matlab's and Python's?

Question

I have two pieces of code in Python and Octave that are structurally identical. However, the Python version, implemented with numpy and scipy, is ~5x faster. I did a profile of the code, and I found that the main culprit in the Octave code is 6 functions repeatedly called thousands of times in a loop. These functions only compute numerical expressions, eg cos, cosh, so I was surprised by how much time they were consuming (For reference, the two codes both run under 2 seconds.)

I researched this strange phenomenon online and read a paper that showed that the function overhead in Octave, ie the setup needed for the function to start executing the actual function code in the function's body and cleaning it up afterwards, is approximately 30 times larger than that of Matlab and approximately 100 times larger than that of Python.

This occurrence greatly baffles me-- How is it possible that calling a function from Octave can be this much slower than calling a function in the two other similar languages? Furthermore, is there any way to remedy this reduction in speed besides copying and pasting the function itself into the body of the loop?

EDIT: I've posted the main for loop from my code. It's an iterative implementation of newton's method for multiple equations, so I'm not sure how it could be vectorized.

for k = 1:10    
  for l = 1:50
    % matrix of derivatives of equations with respect to variables
    a = [dEq1_dq1(p1, p2, q1, q2, i, j), dEq1_dq2(p1, p2, q1, q2, i, j); dEq2_dq1(p1, p2, q1, q2, i, j), dEq2_dq2(p1, p2, q1, q2, i, j)];

    % vector of equations
    b = [Eq1(p1, p2, q1, q2, i, j); Eq2(p1, p2, q1, q2, i, j)];

    % solution to ax=b
    x = a \ b;

    % iteratively update q
    q1 -= beta*x(1);
    q2 -= beta*x(2);
  endfor

  for l = 1:50
    a = [dEp1_dp1(p1, p2, q1, q2, i, j), dEp1_dp2(p1, p2, q1, q2, i, j); dEp2_dp1(p1, p2, q1, q2, i, j), dEp2_dp2(p1, p2, q1, q2, i, j)];
    b = [Ep1(p1, p2, q1, q2, i, j); Ep2(p1, p2, q1, q2, i, j)];
    x = a \ b;

    p1 -= beta*x(1);
    p2 -= beta*x(2);
  endfor
endfor

...

% derivatives of implicit equations with respect to variables
function val = dEp1_dp1(p1, p2, q1, q2, i, j)
  % symmetric
  if mod(i, 2) == 1
    val = p1/(2*cos(p1/2)**2)+tan(p1/2);
  % anti-symmetric
  else
    val = tan(p1/2)/(p1**2)-1/(2*p1*cos(p1/2)**2);
  endif
end

...

function val = Ep1(p1, p2, q1, q2, i, j)    
  if mod(i, 2) == 1
    val = p2*tanh(p2/2)+p1*tan(p1/2);
  else
    val = (1/p2)*tanh(p2/2)-(1/p1)*tan(p1/2);
  endif
end

...

Answer 1

Comparing performance between languages is tricky business. Octave will tell you right away that you should vectorize your code. That's what the language was designed for. Python compiles his code into byte-code and that will allow for optimizations. Matlab has JIT which does the same. But not Octave. Octave will do exactly what you wrote, and will read your program one line at a time. This means that your performance will suffer if you don't write good code.

And while there might be a large overhead for making a function call (I didn't check your numbers), that's not so important if you only make a few functions calls. You will often be dealing with large arrays, so it's the actual "sciency" computations that should be causing your performance issues (unless of course, you don't write proper Octave programs and use unnecessary loops).

The functions you mentioned, cos and cosh , will accept a vector so there's is no need to use a for loop for it.

Why is Octave's function call overhead so much larger than both Matlab's and Python's?

Question

1 answers

solution1
4 2015-07-14 10:52:02

Why is Octave's function call overhead so much larger than both Matlab's and Python's?

Question

1 answers

solution1 4 2015-07-14 10:52:02

solution1
4 2015-07-14 10:52:02