简体   繁体   English

为什么Python和Cython中这两个代码之间存在巨大的性能差异?

[英]Why there is a huge performance difference between these two codes in Python and Cython?

I encountered performance problems in Python, one of my friends suggest me using Cython After searching more i found this code from here 我遇到了在Python的性能问题,我的一个朋友建议我用用Cython搜索更长时间后,我发现这个代码在这里

Python: 蟒蛇:

def test(value):
    for i in xrange(value):
        z = i**2
        if(i==1000000):
            print i
        if z < i:
                print "yes"
test(10000001)

Cython: 用Cython:

def test(long long value):
    cdef long long i
    cdef long long z
    for i in xrange(value):
        z = i**2
        if(i==1000000):
            print i
        if z < i:
            print "yes"

test(10000001)

After i execute both codes, surprisingly i achieved 100x speedup by Cython 在我执行两个代码之后,令人惊讶的是我通过Cython实现了100倍的加速

Why just by adding variable declarations this speedup achieved ?? 为什么只是通过添加变量声明来实现这种加速? Also i should mention the bellow code performance is the same as Python in Cython. 另外我应该提到波纹管代码性能与Cython中的Python相同。

Cython: 用Cython:

def test(long long value):
    for i in xrange(value):
        z = i**2
        if(i==1000000):
            print i
        if z < i:
            print "yes"

test(10000001)

Python is a language. Python是一种语言。 CPython is an bytecode compiler and an interpreter for Python. CPython是一个字节码编译器和Python的解释

It will take some code: 它需要一些代码:

for i in xrange(value):
    z = i**2
    if(i==1000000):
        print i
    if z < i:
        print "yes"

and give you "bytecode": 并给你“字节码”:

  • load the iterator into the for loop and loop its contents into i 将迭代器加载到for循环中并将其内容循环到i
  • load i , load 2 , run binary power, store z 加载i ,加载2 ,运行二进制功率,存储z
  • load i , load 1000000 , compare 加载i ,加载1000000 ,比较
  • load i , print 加载i ,打印
  • load z , load i , compare 加载z ,加载i ,比较
  • load 'yes' , print 加载'yes' ,打印
  • finish

In full: 在全:

  1           0 SETUP_LOOP              70 (to 73)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_NAME                1 (value)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                56 (to 72)
             16 STORE_NAME               2 (i)

  2          19 LOAD_NAME                2 (i)
             22 LOAD_CONST               0 (2)
             25 BINARY_POWER        
             26 STORE_NAME               3 (z)

  3          29 LOAD_NAME                2 (i)
             32 LOAD_CONST               1 (1000000)
             35 COMPARE_OP               2 (==)
             38 POP_JUMP_IF_FALSE       49

  4          41 LOAD_NAME                2 (i)
             44 PRINT_ITEM          
             45 PRINT_NEWLINE       
             46 JUMP_FORWARD             0 (to 49)

  5     >>   49 LOAD_NAME                3 (z)
             52 LOAD_NAME                2 (i)
             55 COMPARE_OP               0 (<)
             58 POP_JUMP_IF_FALSE       13

  6          61 LOAD_CONST               2 ('yes')
             64 PRINT_ITEM          
             65 PRINT_NEWLINE       
             66 JUMP_ABSOLUTE           13
             69 JUMP_ABSOLUTE           13

        >>   72 POP_BLOCK           
        >>   73 LOAD_CONST               3 (None)
             76 RETURN_VALUE

It's worth noting that in Python, an integer is an instance of the class int or long . 值得注意的是,在Python中,整数是intlong类的实例 This means that there is not only the number, but a pointer and another piece of informations saying what class it is at least . 这意味着不仅有数字,还有指针和另一条信息,至少说明它是什么类。 This makes a lot of overhead. 这会产生很多开销。

But it's also worth noting how xrange works. 但值得注意的是xrange如何运作。

xrange creates a class instance ( LOAD_NAME (xrange) , CALL_FUNCTION ) that can be iterated over by the for . xrange创建了一个可以由for迭代的类实例( LOAD_NAME (xrange)CALL_FUNCTION )。 The for will (basically) delegate to a function call on the iterator's __iter__ . for (基本上)将委托给迭代器的__iter__上的函数调用。 There is a function call every loop . 每个循环都有一个函数调用。

Further, every time you want to get or set the variable z or i , it has to look in the locals dictionary. 此外,每次要获取或设置变量zi ,都必须查看本地字典。 This is really slow. 这真的很慢。


Running pure Python-Code in Cython: 在Cython中运行纯Python代码:

When you run it in Cython (the third example in your question), it compiles to C. But all this C does is tell the CPython virtual machine what to do. 当你在Cython中运行它(问题中的第三个例子)时,它会编译为C.但是所有这些C的作用都是告诉 CPython虚拟机要做什么。

CPython alone: a guy reading from a book, and merticulously carrying out its functions. 仅CPython:一个人从书中读书,并且实际执行其功能。
CPython with Cython: a guy shouting instructions to the guy who merticulously carries out its functions. CPython的与用Cython:一个人 说明谁merticulously实现其功能的人。

It might be a tiny bit faster, but the slow part is still that CPython is slowly doing the work. 它可能会快一点,但缓慢的部分仍然是CPython正在慢慢完成工作。


Using cythonized code: 使用cythonized代码:

What happens when you cdef long long , then? 那么当你cdef long long会发生什么呢?

  • Cython knows that xrange is acting on a long long : Cython知道xrange正在做long long事情:

    • It knows the loop is valid (so it doesn't have to check that you gave it a list or somesuch) 它知道循环是有效的(所以它不必检查你给它一个list或某些)

    • It knows the loop won't overflow (because it's undefined if it does!) 它知道循环不会溢出(因为它确实是未定义的!)

    • It can therefore turn it into a C loop ( for (int index=0; index<copy_of_value; index++) { i = index; ... } ) 因此它可以把它变成一个C循环( for (int index=0; index<copy_of_value; index++) { i = index; ... }

  • This avoids the int and long classes, which have a lot of indirection overhead and type checking 这避免了intlong类,它们具有大量的间接开销和类型检查

  • This avoids dictionary lookups. 这避免了字典查找。 Things are always where you put them on the stack 事情永远都是你把它们放在堆栈上的地方

  • For example i ** 2 is much simpler as the routine can be inlined (it's always a number, dude) and work directly on the integer and ignore overflow 例如i ** 2更简单,因为例程可以内联(它总是一个数字,粗鲁)并直接在整数上工作并忽略溢出

So the result ends up being run mostly by C, and only goes to CPython for some cleanup stuff and the print calls. 因此,结果最终主要由C运行,并且只进入CPython进行一些清理和print调用。


Make sense? 合理?

As I mentioned in my comment: Your third solution is slower/as-slow-as-the-python-version because it lacks the static typing features that allow Cython to speed up your code. 正如我在评论中提到的:你的第三个解决方案是较慢/ as-slow-as-python-version,因为它缺少允许Cython加速代码的静态类型功能。 When you declare a variable as long fe, Cython does not need to construct an "expensive" Python-Object, but may rely on C-Code completely. 当你将变量声明为long fe时,Cython不需要构造一个“昂贵的”Python-Object,但可能完全依赖于C-Code。 I'm not a Cython nor a Python expert, but I guess Python's object construction is the main bottleneck. 我不是Cython也不是Python专家,但我猜Python的对象构造是主要的瓶颈。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python:if-in运算符。 这两个代码之间有区别吗? - python: if - in operator. Is there a difference between these two codes? 这两个代码之间的区别? - Difference between these two codes? Python 性能问题:两个多边形之间的差异 - Python performance problem: difference between two polygons python-两种实现之间的性能差异 - python - performance difference between the two implementations 为什么在相同的Python / Java代码之间存在如此巨大的性能差异? - Why is there such a huge performance different between the same Python/Java code? 这两个Python子集递归函数代码有什么区别? - What is the difference between these two codes of Python subset recursion function? 这两个python代码有什么区别? 为什么结果不同? - what's the difference between these 2 python codes? why different results? 这两个 python 代码有什么区别,为什么输出不一样? - what is the difference between these 2 python codes, and why the out put is not same? Python(2.7):为什么以下两个代码片段之间存在性能差异,这两个代码片段实现了两个字典的交集 - Python (2.7): Why is there a performance difference between the following 2 code snippets that implement the intersection of two dictionaries 虽然下面两个函数的时间复杂度似乎相似,但为什么性能会有很大差异呢? - Why is there a huge difference in performance though time complexity for the two functions below seems to be similar?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM