[英]Why there is a huge performance difference between these two codes in Python and Cython?
I encountered performance problems in Python, one of my friends suggest me using Cython After searching more i found this code from here 我遇到了在Python的性能问题,我的一个朋友建议我用用Cython搜索更长时间后,我发现这个代码在这里
Python: 蟒蛇:
def test(value):
for i in xrange(value):
z = i**2
if(i==1000000):
print i
if z < i:
print "yes"
test(10000001)
Cython: 用Cython:
def test(long long value):
cdef long long i
cdef long long z
for i in xrange(value):
z = i**2
if(i==1000000):
print i
if z < i:
print "yes"
test(10000001)
After i execute both codes, surprisingly i achieved 100x speedup by Cython 在我执行两个代码之后,令人惊讶的是我通过Cython实现了100倍的加速
Why just by adding variable declarations this speedup achieved ?? 为什么只是通过添加变量声明来实现这种加速? Also i should mention the bellow code performance is the same as Python in Cython.
另外我应该提到波纹管代码性能与Cython中的Python相同。
Cython: 用Cython:
def test(long long value):
for i in xrange(value):
z = i**2
if(i==1000000):
print i
if z < i:
print "yes"
test(10000001)
Python is a language. Python是一种语言。 CPython is an bytecode compiler and an interpreter for Python.
CPython是一个字节码编译器和Python的解释 器 。
It will take some code: 它需要一些代码:
for i in xrange(value):
z = i**2
if(i==1000000):
print i
if z < i:
print "yes"
and give you "bytecode": 并给你“字节码”:
for
loop and loop its contents into i
for
循环中并将其内容循环到i
i
, load 2
, run binary power, store z
i
,加载2
,运行二进制功率,存储z
i
, load 1000000
, compare i
,加载1000000
,比较 i
, print i
,打印 z
, load i
, compare z
,加载i
,比较 'yes'
, print 'yes'
,打印 In full: 在全:
1 0 SETUP_LOOP 70 (to 73)
3 LOAD_NAME 0 (xrange)
6 LOAD_NAME 1 (value)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 56 (to 72)
16 STORE_NAME 2 (i)
2 19 LOAD_NAME 2 (i)
22 LOAD_CONST 0 (2)
25 BINARY_POWER
26 STORE_NAME 3 (z)
3 29 LOAD_NAME 2 (i)
32 LOAD_CONST 1 (1000000)
35 COMPARE_OP 2 (==)
38 POP_JUMP_IF_FALSE 49
4 41 LOAD_NAME 2 (i)
44 PRINT_ITEM
45 PRINT_NEWLINE
46 JUMP_FORWARD 0 (to 49)
5 >> 49 LOAD_NAME 3 (z)
52 LOAD_NAME 2 (i)
55 COMPARE_OP 0 (<)
58 POP_JUMP_IF_FALSE 13
6 61 LOAD_CONST 2 ('yes')
64 PRINT_ITEM
65 PRINT_NEWLINE
66 JUMP_ABSOLUTE 13
69 JUMP_ABSOLUTE 13
>> 72 POP_BLOCK
>> 73 LOAD_CONST 3 (None)
76 RETURN_VALUE
It's worth noting that in Python, an integer is an instance of the class int
or long
. 值得注意的是,在Python中,整数是
int
或long
类的实例 。 This means that there is not only the number, but a pointer and another piece of informations saying what class it is at least . 这意味着不仅有数字,还有指针和另一条信息,至少说明它是什么类。 This makes a lot of overhead.
这会产生很多开销。
But it's also worth noting how xrange
works. 但值得注意的是
xrange
如何运作。
xrange
creates a class instance ( LOAD_NAME (xrange)
, CALL_FUNCTION
) that can be iterated over by the for
. xrange
创建了一个可以由for
迭代的类实例( LOAD_NAME (xrange)
, CALL_FUNCTION
)。 The for
will (basically) delegate to a function call on the iterator's __iter__
. for
(基本上)将委托给迭代器的__iter__
上的函数调用。 There is a function call every loop . 每个循环都有一个函数调用。
Further, every time you want to get or set the variable z
or i
, it has to look in the locals dictionary. 此外,每次要获取或设置变量
z
或i
,都必须查看本地字典。 This is really slow. 这真的很慢。
Running pure Python-Code in Cython: 在Cython中运行纯Python代码:
When you run it in Cython (the third example in your question), it compiles to C. But all this C does is tell the CPython virtual machine what to do. 当你在Cython中运行它(问题中的第三个例子)时,它会编译为C.但是所有这些C的作用都是告诉 CPython虚拟机要做什么。
CPython alone: a guy reading from a book, and merticulously carrying out its functions. 仅CPython:一个人从书中读书,并且实际执行其功能。
CPython with Cython: a guy shouting instructions to the guy who merticulously carries out its functions. CPython的与用Cython:一个人喊 的说明谁merticulously实现其功能的人。
It might be a tiny bit faster, but the slow part is still that CPython is slowly doing the work. 它可能会快一点,但缓慢的部分仍然是CPython正在慢慢完成工作。
Using cythonized code: 使用cythonized代码:
What happens when you cdef long long
, then? 那么当你
cdef long long
会发生什么呢?
Cython knows that xrange
is acting on a long long
: Cython知道
xrange
正在做long long
事情:
It knows the loop is valid (so it doesn't have to check that you gave it a list
or somesuch) 它知道循环是有效的(所以它不必检查你给它一个
list
或某些)
It knows the loop won't overflow (because it's undefined if it does!) 它知道循环不会溢出(因为它确实是未定义的!)
It can therefore turn it into a C loop ( for (int index=0; index<copy_of_value; index++) { i = index; ... }
) 因此它可以把它变成一个C循环(
for (int index=0; index<copy_of_value; index++) { i = index; ... }
)
This avoids the int
and long
classes, which have a lot of indirection overhead and type checking 这避免了
int
和long
类,它们具有大量的间接开销和类型检查
This avoids dictionary lookups. 这避免了字典查找。 Things are always where you put them on the stack
事情永远都是你把它们放在堆栈上的地方
For example i ** 2
is much simpler as the routine can be inlined (it's always a number, dude) and work directly on the integer and ignore overflow 例如
i ** 2
更简单,因为例程可以内联(它总是一个数字,粗鲁)并直接在整数上工作并忽略溢出
So the result ends up being run mostly by C, and only goes to CPython for some cleanup stuff and the print
calls. 因此,结果最终主要由C运行,并且只进入CPython进行一些清理和
print
调用。
Make sense? 合理?
As I mentioned in my comment: Your third solution is slower/as-slow-as-the-python-version because it lacks the static typing features that allow Cython to speed up your code. 正如我在评论中提到的:你的第三个解决方案是较慢/ as-slow-as-python-version,因为它缺少允许Cython加速代码的静态类型功能。 When you declare a variable as
long
fe, Cython does not need to construct an "expensive" Python-Object, but may rely on C-Code completely. 当你将变量声明为
long
fe时,Cython不需要构造一个“昂贵的”Python-Object,但可能完全依赖于C-Code。 I'm not a Cython nor a Python expert, but I guess Python's object construction is the main bottleneck. 我不是Cython也不是Python专家,但我猜Python的对象构造是主要的瓶颈。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.