简体   繁体   English

MATLAB比Python更快(简单的小实验)

[英]Is MATLAB faster than Python (little simple experiment)

I have read this ( Is MATLAB faster than Python? ) and I find it has lots of ifs. 我读过这个( MATLAB比Python快吗? ),我发现它有很多ifs。

I have tried this little experiment on an old computer that still runs on Windows XP. 我在仍在Windows XP上运行的旧计算机上尝试过这个小实验。

In MATLAB R2010b I have copied and pasted the following code in the Command Window: 在MATLAB R2010b中,我在命令窗口中复制并粘贴了以下代码:

tic
x = 0.23;
for i = 1:100000000
  x = 4 * x * (1 - x);
end
toc
x

The result was: 结果是:

Elapsed time is 0.603583 seconds.

x =

    0.947347510922557

Then I saved a py file with the following script: 然后我用以下脚本保存了一个py文件:

import time
t = time.time()
x = 0.23
for i in range(100000000): x = 4 * x * (1 - x)
elapsed = time.time() - t
print(elapsed)
print(x)

I pressed F5 and the result was 我按F5 ,结果是

49.78125
0.9473475109225565

In MATLAB it took 0.60 seconds; 在MATLAB中耗时0.60秒; in Python it took 49.78 seconds (an eternity!!). 在Python中花了49.78秒(永恒!!)。

So the question is : is there a simple way to make Python as fast as MATLAB? 所以问题是 :有没有一种简单的方法让Python像MATLAB一样快?

Specifically : how do I change my py script so that it runs as fast as MATLAB? 具体来说 :如何更改我的py脚本,使其运行速度与MATLAB一样快?


UPDATE UPDATE

I have tried the same experiment in PyPy (copying and pasting the same code as above): it did it in 1.0470001697540283 seconds on the same machine as before. 我在PyPy尝试过相同的实验(复制和粘贴上面相同的代码):它在1.0470001697540283秒内在同一台机器上完成。

I repeated the experiments with 1e9 loops. 我用1e9循环重复实验。

MATLAB results: MATLAB结果:

Elapsed time is 5.599789 seconds.
1.643573442831396e-004

PyPy results: PyPy结果:

8.609999895095825
0.00016435734428313955

I have also tried with a normal while loop, with similar results: 我也试过了一个普通的while循环,结果类似:

t = time.time()
x = 0.23
i = 0
while (i < 1000000000):
    x = 4 * x * (1 - x)
    i += 1

elapsed = time.time() - t
elapsed
x

Results : 结果

8.218999862670898
0.00016435734428313955

I am going to try NumPy in a little while. 我会在一段时间内尝试NumPy

First, using time is not a good way to test code like this. 首先,使用time不是测试这样的代码的好方法。 But let's ignore that. 但是,让我们忽略它。


When you have code that does a lot of looping and repeating very similar work each time through the loop, PyPy 's JIT will do a great job. 当你的代码执行大量循环并且每次循环都重复非常相似的工作时, PyPy的JIT会做得很好。 When that code does the exact same thing every time, to constant values that can be lifted out of the loop, it'll do even better. 当代码每次执行完全相同的操作时,对于可以从循环中提取的常量值,它会做得更好。 CPython, on the other hand, has to execute multiple bytecodes for each loop iteration, so it will be slow. 另一方面,CPython必须为每个循环迭代执行多个字节码,因此速度很慢。 From a quick test on my machine, CPython 3.4.1 takes 24.2 seconds, but PyPy 2.4.0/3.2.5 takes 0.0059 seconds. 通过对我的机器的快速测试,CPython 3.4.1需要24.2秒,但PyPy 2.4.0 / 3.2.5需要0.0059秒。

IronPython and Jython are also JIT-compiled (although using the more generic JVM and .NET JITs), so they tend to be faster than CPython for this kind of work as well. IronPython和Jython也是JIT编译的(虽然使用更通用的JVM和.NET JIT),因此它们也比CPython更快地进行这种工作。


You can also generally speed up work like this in CPython itself by using NumPy arrays and vector operations instead of Python lists and loops. 您通常也可以通过使用NumPy数组和向量操作而不是Python列表和循环来加速CPython本身的工作。 For example, the following code takes 0.011 seconds: 例如,以下代码需要0.011秒:

i = np.arange(10000000)
i[:] = 4 * x * (1-x)

Of course in that case, we're explicitly just computing the value once and copying it 10000000 times. 当然,在这种情况下,我们只是计算一次值并将其复制10000000次。 But we can force it to actually compute over and over again, and it still takes only 0.12 seconds: 但是我们可以强制它实际反复计算,它仍然只需要0.12秒:

i = np.zeros((10000000,))
i = 4 * (x+i) * (1-(x+i))

Other options include writing part of the code in Cython (which compiles to a C extension for Python), and using Numba , which JIT-compiles code within CPython. 其他选项包括在Cython中编写部分代码(编译为Python的C扩展),并使用Numba ,JIT编译CPython中的代码。 For toy programs like this, neither may be appropriate—the time spent auto-generating and compiling C code may swamp the time saved by running C code instead of Python code if you're only trying to optimize a one-time 24-second process. 对于这样的玩具程序,两者都不合适 - 如果您只是尝试优化一次性24秒进程,那么自动生成和编译C代码所花费的时间可能会浪费运行C代码而不是Python代码所节省的时间。 But in real-life numerical programming, both are very useful. 但在现实生活中的数值编程中,两者都非常有用。 (And both play nicely with NumPy.) (两者都与NumPy很好地配合。)

And there are always new projects on the horizon as well. 并且总会有新项目出现。

A (somewhat educated) guess is that python does not perform loop unrolling on your code while MATLAB does . 一个(有点受过教育的)猜测是,在MATLAB执行时,python不会对代码执行循环展开 This means the MATLAB code is performing one large computation rather than many (!) smaller ones. 这意味着MATLAB代码执行一个大型计算而不是许多(!)较小的计算。 This is a major reason for going with PyPy rather than CPython, as PyPy does loop unrolling . 这是使用PyPy而不是CPython的主要原因,因为PyPy 会循环展开

If you're using python 2.X, you should substitute range for xrange , as range (in python 2.X) creates a list to iterate through. 如果你正在使用python 2.X,你应该用range替换xrange ,因为range (在python 2.X中)会创建一个迭代的列表。

Q: how do I change my py script so that it runs as fast as MATLAB? 问:如何更改py脚本以使其运行速度与MATLAB一样快?

as abarnet has already given you a lot of knowledgeable directions, let me add my two cents ( and some quantitative results ). 因为abarnet已经给了你很多知识渊博的方向,让我加上我的两分钱(以及一些定量结果)。

( similarly I hope you will forgive to skip the for: & assume a more complex computational task ) (同样地,我希望你能原谅跳过for:并承担更复杂的计算任务)

  • review the code for any possible algorithmic improvements, value re-use(s) and register/cache-friendly arrangements ( numpy.asfortranarray() et al ) 检查代码是否有任何可能的算法改进,值重用和注册/缓存友好的安排( numpy.asfortranarray()等)

  • use vectorised code-execution / loop-unrolling in numpy , wherever possible 尽可能在numpy使用矢量化代码执行/循环展开

  • use LLVM-compiler alike numba for stable parts of your code 使用类似LLVM编译器的numba来获得代码的稳定部分

  • use additional (JIT)-compiler tricks ( nogil = True, nopython = True ) only on final grade of the code to avoid a common premature-optimisation mistake 仅在代码的最终等级上使用附加(JIT) - 编译器技巧(nogil = True,nopython = True)以避免常见的过早优化错误

Achievements that are possible are indeed huge: 可能的成就确实很大:

纳秒很重要的地方

An inital code sample is taken from FX arena ( where milliseconds, microseconds & (wasted) nanoseconds indeed do matter - check that for 50% market events you have far less than 900 milliseconds to act ( end-to-end bi-directional transaction ), not speaking about HFT ... ) for processing EMA(200,CLOSE) - a non-trivial exponential moving average over the last 200 GBPUSD candles/bars in an array of about 5200+ rows: 一个初始代码样本来自FX竞技场(毫秒,微秒和(浪费)纳秒确实很重要 - 检查50%的市场事件你有远远少于900毫秒的行为(端到端双向交易) ,而不是谈论HFT ...)处理EMA(200,CLOSE) - 在大约5200多行的数组中,最后200英镑美元蜡烛/柱上的非平凡指数移动平均线:

import numba
#@jit                                               # 2015-06 @autojit deprecated
@numba.jit('f8[:](i8,f8[:])')
def numba_EMA_fromPrice( N_period, aPriceVECTOR ):
    EMA = aPriceVECTOR.copy()
    alf = 2. / ( N_period + 1 )
    for aPTR in range( 1, EMA.shape[0] ):
        EMA[aPTR] = EMA[aPTR-1] + alf * ( aPriceVECTOR[aPTR] - EMA[aPTR-1] )
    return EMA

For this "classical" code, just the very numba compilation step has made an improvement over the ordinary python/numpy code execution 对于这个“经典”代码,只是非常numba编译步骤已经改进了普通的python / numpy代码执行

21x down to about half a millisecond 21x下降到大约半毫秒

#   541L

from about 11499 [us] ( yes, from about 11500 microseconds to just 541 [us] ) 从大约11499 [us](是的,从大约11500微秒到大约541 [我们])

#       classical numpy
# aClk.start();X[:,7] = EMA_fromPrice( 200, price_H4_CLOSE );aClk.stop()
# 11499L

But, if you take more caution to the algorithm, and re-design it so as to work smarter & more resources-efficiently, the results are even more fruitfull 但是,如果你对算法更加谨慎,并重新设计它以便更智能,更有效地工作, 结果更加富有成果

@numba.jit
def numba_EMA_fromPrice_EFF_ALGO( N_period, aPriceVECTOR ):
    alfa    = 2. / ( N_period + 1 )
    coef    = ( 1 - alfa )
    EMA     = aPriceVECTOR * alfa
    EMA[1:]+= EMA[0:-1]    * coef
    return EMA

#   aClk.start();numba_EMA_fromPrice_EFF_ALGO( 200, price_H4_CLOSE );aClk.stop()
#   Out[112]: 160814L                               # JIT-compile-pass
#   Out[113]:    331L                               # re-use 0.3 [ms] v/s 11.5 [ms] CPython
#   Out[114]:    311L
#   Out[115]:    324L

And the final polishing-touch for multi-CPU-core processing 最后的抛光 - 触摸多CPU核心处理


46x accelerated down to about a quarter of a millisecond 46倍加速到大约四分之一毫秒

# ___________vvvvv__________# !!!     !!! 
#@numba.jit( nogil = True ) # JIT w/o GIL-lock w/ multi-CORE ** WARNING: ThreadSafe / DataCoherency measures **
#   aClk.start();numba_EMA_fromPrice_EFF_ALGO( 200, price_H4_CLOSE );aClk.stop()
#   Out[126]: 149929L                               # JIT-compile-pass
#   Out[127]:    284L                               # re-use 0.3 [ms] v/s 11.5 [ms] CPython
#   Out[128]:    256L

As a final bonus. 作为最后的奖金。 Faster is sometimes not the same as better. 更快更快,有时也不一样。

Surprised? 惊讶吗?

No, there is nothing strange in this. 不,这没什么奇怪的。 Try to make MATLAB calculate SQRT( 2 ) to a precision of about 500.000.000 places behind a decimal point. 尝试使MATLAB将SQRT(2)计算为小数点后约500.000.000位的精度。 There it goes. 它去了。

Nanoseconds do matter. 纳秒很重要。 The more here, where precision is the target. 在这里,精度是目标。


Isn't that worth time & efforts? 这不值得时间和努力吗? Sure, it is. 当然是啦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM