简体   繁体   中英

Is MATLAB faster than Python (little simple experiment)

I have read this ( Is MATLAB faster than Python? ) and I find it has lots of ifs.

I have tried this little experiment on an old computer that still runs on Windows XP.

In MATLAB R2010b I have copied and pasted the following code in the Command Window:

tic
x = 0.23;
for i = 1:100000000
  x = 4 * x * (1 - x);
end
toc
x

The result was:

Elapsed time is 0.603583 seconds.

x =

    0.947347510922557

Then I saved a py file with the following script:

import time
t = time.time()
x = 0.23
for i in range(100000000): x = 4 * x * (1 - x)
elapsed = time.time() - t
print(elapsed)
print(x)

I pressed F5 and the result was

49.78125
0.9473475109225565

In MATLAB it took 0.60 seconds; in Python it took 49.78 seconds (an eternity!!).

So the question is : is there a simple way to make Python as fast as MATLAB?

Specifically : how do I change my py script so that it runs as fast as MATLAB?


UPDATE

I have tried the same experiment in PyPy (copying and pasting the same code as above): it did it in 1.0470001697540283 seconds on the same machine as before.

I repeated the experiments with 1e9 loops.

MATLAB results:

Elapsed time is 5.599789 seconds.
1.643573442831396e-004

PyPy results:

8.609999895095825
0.00016435734428313955

I have also tried with a normal while loop, with similar results:

t = time.time()
x = 0.23
i = 0
while (i < 1000000000):
    x = 4 * x * (1 - x)
    i += 1

elapsed = time.time() - t
elapsed
x

Results :

8.218999862670898
0.00016435734428313955

I am going to try NumPy in a little while.

First, using time is not a good way to test code like this. But let's ignore that.


When you have code that does a lot of looping and repeating very similar work each time through the loop, PyPy 's JIT will do a great job. When that code does the exact same thing every time, to constant values that can be lifted out of the loop, it'll do even better. CPython, on the other hand, has to execute multiple bytecodes for each loop iteration, so it will be slow. From a quick test on my machine, CPython 3.4.1 takes 24.2 seconds, but PyPy 2.4.0/3.2.5 takes 0.0059 seconds.

IronPython and Jython are also JIT-compiled (although using the more generic JVM and .NET JITs), so they tend to be faster than CPython for this kind of work as well.


You can also generally speed up work like this in CPython itself by using NumPy arrays and vector operations instead of Python lists and loops. For example, the following code takes 0.011 seconds:

i = np.arange(10000000)
i[:] = 4 * x * (1-x)

Of course in that case, we're explicitly just computing the value once and copying it 10000000 times. But we can force it to actually compute over and over again, and it still takes only 0.12 seconds:

i = np.zeros((10000000,))
i = 4 * (x+i) * (1-(x+i))

Other options include writing part of the code in Cython (which compiles to a C extension for Python), and using Numba , which JIT-compiles code within CPython. For toy programs like this, neither may be appropriate—the time spent auto-generating and compiling C code may swamp the time saved by running C code instead of Python code if you're only trying to optimize a one-time 24-second process. But in real-life numerical programming, both are very useful. (And both play nicely with NumPy.)

And there are always new projects on the horizon as well.

A (somewhat educated) guess is that python does not perform loop unrolling on your code while MATLAB does . This means the MATLAB code is performing one large computation rather than many (!) smaller ones. This is a major reason for going with PyPy rather than CPython, as PyPy does loop unrolling .

If you're using python 2.X, you should substitute range for xrange , as range (in python 2.X) creates a list to iterate through.

Q: how do I change my py script so that it runs as fast as MATLAB?

as abarnet has already given you a lot of knowledgeable directions, let me add my two cents ( and some quantitative results ).

( similarly I hope you will forgive to skip the for: & assume a more complex computational task )

  • review the code for any possible algorithmic improvements, value re-use(s) and register/cache-friendly arrangements ( numpy.asfortranarray() et al )

  • use vectorised code-execution / loop-unrolling in numpy , wherever possible

  • use LLVM-compiler alike numba for stable parts of your code

  • use additional (JIT)-compiler tricks ( nogil = True, nopython = True ) only on final grade of the code to avoid a common premature-optimisation mistake

Achievements that are possible are indeed huge:

纳秒很重要的地方

An inital code sample is taken from FX arena ( where milliseconds, microseconds & (wasted) nanoseconds indeed do matter - check that for 50% market events you have far less than 900 milliseconds to act ( end-to-end bi-directional transaction ), not speaking about HFT ... ) for processing EMA(200,CLOSE) - a non-trivial exponential moving average over the last 200 GBPUSD candles/bars in an array of about 5200+ rows:

import numba
#@jit                                               # 2015-06 @autojit deprecated
@numba.jit('f8[:](i8,f8[:])')
def numba_EMA_fromPrice( N_period, aPriceVECTOR ):
    EMA = aPriceVECTOR.copy()
    alf = 2. / ( N_period + 1 )
    for aPTR in range( 1, EMA.shape[0] ):
        EMA[aPTR] = EMA[aPTR-1] + alf * ( aPriceVECTOR[aPTR] - EMA[aPTR-1] )
    return EMA

For this "classical" code, just the very numba compilation step has made an improvement over the ordinary python/numpy code execution

21x down to about half a millisecond

#   541L

from about 11499 [us] ( yes, from about 11500 microseconds to just 541 [us] )

#       classical numpy
# aClk.start();X[:,7] = EMA_fromPrice( 200, price_H4_CLOSE );aClk.stop()
# 11499L

But, if you take more caution to the algorithm, and re-design it so as to work smarter & more resources-efficiently, the results are even more fruitfull

@numba.jit
def numba_EMA_fromPrice_EFF_ALGO( N_period, aPriceVECTOR ):
    alfa    = 2. / ( N_period + 1 )
    coef    = ( 1 - alfa )
    EMA     = aPriceVECTOR * alfa
    EMA[1:]+= EMA[0:-1]    * coef
    return EMA

#   aClk.start();numba_EMA_fromPrice_EFF_ALGO( 200, price_H4_CLOSE );aClk.stop()
#   Out[112]: 160814L                               # JIT-compile-pass
#   Out[113]:    331L                               # re-use 0.3 [ms] v/s 11.5 [ms] CPython
#   Out[114]:    311L
#   Out[115]:    324L

And the final polishing-touch for multi-CPU-core processing


46x accelerated down to about a quarter of a millisecond

# ___________vvvvv__________# !!!     !!! 
#@numba.jit( nogil = True ) # JIT w/o GIL-lock w/ multi-CORE ** WARNING: ThreadSafe / DataCoherency measures **
#   aClk.start();numba_EMA_fromPrice_EFF_ALGO( 200, price_H4_CLOSE );aClk.stop()
#   Out[126]: 149929L                               # JIT-compile-pass
#   Out[127]:    284L                               # re-use 0.3 [ms] v/s 11.5 [ms] CPython
#   Out[128]:    256L

As a final bonus. Faster is sometimes not the same as better.

Surprised?

No, there is nothing strange in this. Try to make MATLAB calculate SQRT( 2 ) to a precision of about 500.000.000 places behind a decimal point. There it goes.

Nanoseconds do matter. The more here, where precision is the target.


Isn't that worth time & efforts? Sure, it is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM