为什么Python和Cython中这两个代码之间存在巨大的性能差异？

Question

I encountered performance problems in Python, one of my friends suggest me using Cython After searching more i found this code from here 我遇到了在Python的性能问题，我的一个朋友建议我用用Cython搜索更长时间后，我发现这个代码在这里

Python: 蟒蛇：

def test(value):
    for i in xrange(value):
        z = i**2
        if(i==1000000):
            print i
        if z < i:
                print "yes"
test(10000001)

Cython: 用Cython：

def test(long long value):
    cdef long long i
    cdef long long z
    for i in xrange(value):
        z = i**2
        if(i==1000000):
            print i
        if z < i:
            print "yes"

test(10000001)

After i execute both codes, surprisingly i achieved 100x speedup by Cython 在我执行两个代码之后，令人惊讶的是我通过Cython实现了100倍的加速

Why just by adding variable declarations this speedup achieved ?? 为什么只是通过添加变量声明来实现这种加速？ Also i should mention the bellow code performance is the same as Python in Cython. 另外我应该提到波纹管代码性能与Cython中的Python相同。

Cython: 用Cython：

def test(long long value):
    for i in xrange(value):
        z = i**2
        if(i==1000000):
            print i
        if z < i:
            print "yes"

test(10000001)

Answer 1

Python is a language. Python是一种语言。 CPython is an bytecode compiler and an interpreter for Python. CPython是一个字节码编译器和Python的解释器。

It will take some code: 它需要一些代码：

for i in xrange(value):
    z = i**2
    if(i==1000000):
        print i
    if z < i:
        print "yes"

and give you "bytecode": 并给你“字节码”：

load the iterator into the for loop and loop its contents into i 将迭代器加载到for循环中并将其内容循环到i
load i , load 2 , run binary power, store z 加载i ，加载2 ，运行二进制功率，存储z
load i , load 1000000 , compare 加载i ，加载1000000 ，比较
load i , print 加载i ，打印
load z , load i , compare 加载z ，加载i ，比较
load 'yes' , print 加载'yes' ，打印
finish 完

In full: 在全：

  1           0 SETUP_LOOP              70 (to 73)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_NAME                1 (value)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                56 (to 72)
             16 STORE_NAME               2 (i)

  2          19 LOAD_NAME                2 (i)
             22 LOAD_CONST               0 (2)
             25 BINARY_POWER        
             26 STORE_NAME               3 (z)

  3          29 LOAD_NAME                2 (i)
             32 LOAD_CONST               1 (1000000)
             35 COMPARE_OP               2 (==)
             38 POP_JUMP_IF_FALSE       49

  4          41 LOAD_NAME                2 (i)
             44 PRINT_ITEM          
             45 PRINT_NEWLINE       
             46 JUMP_FORWARD             0 (to 49)

  5     >>   49 LOAD_NAME                3 (z)
             52 LOAD_NAME                2 (i)
             55 COMPARE_OP               0 (<)
             58 POP_JUMP_IF_FALSE       13

  6          61 LOAD_CONST               2 ('yes')
             64 PRINT_ITEM          
             65 PRINT_NEWLINE       
             66 JUMP_ABSOLUTE           13
             69 JUMP_ABSOLUTE           13

        >>   72 POP_BLOCK           
        >>   73 LOAD_CONST               3 (None)
             76 RETURN_VALUE

It's worth noting that in Python, an integer is an instance of the class int or long . 值得注意的是，在Python中，整数是int或long类的实例。 This means that there is not only the number, but a pointer and another piece of informations saying what class it is at least . 这意味着不仅有数字，还有指针和另一条信息，至少说明它是什么类。 This makes a lot of overhead. 这会产生很多开销。

But it's also worth noting how xrange works. 但值得注意的是xrange如何运作。

xrange creates a class instance ( LOAD_NAME (xrange) , CALL_FUNCTION ) that can be iterated over by the for . xrange创建了一个可以由for迭代的类实例（ LOAD_NAME (xrange) ， CALL_FUNCTION ）。 The for will (basically) delegate to a function call on the iterator's __iter__ . for （基本上）将委托给迭代器的__iter__上的函数调用。 There is a function call every loop . 每个循环都有一个函数调用。

Further, every time you want to get or set the variable z or i , it has to look in the locals dictionary. 此外，每次要获取或设置变量z或i ，都必须查看本地字典。 This is really slow. 这真的很慢。

Running pure Python-Code in Cython: 在Cython中运行纯Python代码：

When you run it in Cython (the third example in your question), it compiles to C. But all this C does is tell the CPython virtual machine what to do. 当你在Cython中运行它（问题中的第三个例子）时，它会编译为C.但是所有这些C的作用都是告诉 CPython虚拟机要做什么。

CPython alone: a guy reading from a book, and merticulously carrying out its functions. 仅CPython：一个人从书中读书，并且实际执行其功能。
CPython with Cython: a guy shouting instructions to the guy who merticulously carries out its functions. CPython的与用Cython：一个人喊的说明谁merticulously实现其功能的人。

It might be a tiny bit faster, but the slow part is still that CPython is slowly doing the work. 它可能会快一点，但缓慢的部分仍然是CPython正在慢慢完成工作。

Using cythonized code: 使用cythonized代码：

What happens when you cdef long long , then? 那么当你cdef long long会发生什么呢？

Cython knows that xrange is acting on a long long : Cython知道xrange正在做long long事情：
- It knows the loop is valid (so it doesn't have to check that you gave it a list or somesuch) 它知道循环是有效的（所以它不必检查你给它一个list或某些）
- It knows the loop won't overflow (because it's undefined if it does!) 它知道循环不会溢出（因为它确实是未定义的！）
- It can therefore turn it into a C loop ( for (int index=0; index<copy_of_value; index++) { i = index; ... } ) 因此它可以把它变成一个C循环（ for (int index=0; index<copy_of_value; index++) { i = index; ... } ）
This avoids the int and long classes, which have a lot of indirection overhead and type checking 这避免了int和long类，它们具有大量的间接开销和类型检查
This avoids dictionary lookups. 这避免了字典查找。 Things are always where you put them on the stack 事情永远都是你把它们放在堆栈上的地方
For example i ** 2 is much simpler as the routine can be inlined (it's always a number, dude) and work directly on the integer and ignore overflow 例如i ** 2更简单，因为例程可以内联（它总是一个数字，粗鲁）并直接在整数上工作并忽略溢出

So the result ends up being run mostly by C, and only goes to CPython for some cleanup stuff and the print calls. 因此，结果最终主要由C运行，并且只进入CPython进行一些清理和print调用。

Make sense? 合理？

Answer 2

As I mentioned in my comment: Your third solution is slower/as-slow-as-the-python-version because it lacks the static typing features that allow Cython to speed up your code. 正如我在评论中提到的：你的第三个解决方案是较慢/ as-slow-as-python-version，因为它缺少允许Cython加速代码的静态类型功能。 When you declare a variable as long fe, Cython does not need to construct an "expensive" Python-Object, but may rely on C-Code completely. 当你将变量声明为long fe时，Cython不需要构造一个“昂贵的”Python-Object，但可能完全依赖于C-Code。 I'm not a Cython nor a Python expert, but I guess Python's object construction is the main bottleneck. 我不是Cython也不是Python专家，但我猜Python的对象构造是主要的瓶颈。

为什么Python和Cython中这两个代码之间存在巨大的性能差异？

问题描述

2 个解决方案

解决方案1
6 已采纳 2014-04-09 11:03:58

解决方案2
1 2014-04-08 15:06:04

为什么Python和Cython中这两个代码之间存在巨大的性能差异？

问题描述

2 个解决方案

解决方案1 6 已采纳 2014-04-09 11:03:58

解决方案2 1 2014-04-08 15:06:04

解决方案1
6 已采纳 2014-04-09 11:03:58

解决方案2
1 2014-04-08 15:06:04