[英]Cython implementation no faster than pure python
For an exercise I've written a XOR doubly-linked list对于一个练习,我写了一个 XOR 双向链表
%%cython
from cpython.object cimport PyObject
from cpython.ref cimport Py_XINCREF, Py_XDECREF
from libc.stdint cimport uintptr_t
cdef class Node:
cdef uintptr_t _prev_xor_next
cdef object val
def __init__(self, object val, uintptr_t prev_xor_next=0):
self._prev_xor_next=prev_xor_next
self.val=val
@property
def prev_xor_next(self):
return self._prev_xor_next
@prev_xor_next.setter
def prev_xor_next(self, uintptr_t p):
self._prev_xor_next=p
def __repr__(self):
return str(self.val)
cdef class CurrentNode(Node):
cdef uintptr_t _node, _prev_ptr
def __init__(self, uintptr_t node, uintptr_t prev_ptr=0):
self._node = node
self._prev_ptr= prev_ptr
@property
def val(self):
return self.node.val
@property
def node(self):
ret=<PyObject *> self._node
return <Node> ret
@property
def prev_ptr(self):
return self._prev_ptr
cdef CurrentNode forward(self):
if self.node.prev_xor_next!=self._prev_ptr:
return CurrentNode(self.node.prev_xor_next^self._prev_ptr, self._node)
cdef CurrentNode backward(self):
if self._prev_ptr:
pp=<PyObject*>self._prev_ptr
return CurrentNode(self._prev_ptr, self._node^(<Node> pp).prev_xor_next)
def __repr__(self):
return str(self.node)
cdef class XORList:
cdef PyObject* first
cdef PyObject* last
cdef int length
def __init__(self):
self.length=0
@property
def head(self):
return (<Node> self.first)
@property
def tail(self):
return (<Node> self.last)
cdef append(self, object val):
self.length+=1
#empty list
if not self.first:
t=Node(val)
tp=(<PyObject*> t)
self.first=tp
Py_XINCREF(tp)
self.last=tp
Py_XINCREF(tp)
#not empty
else:
new_node=Node(val, <uintptr_t> self.last)
new_ptr=<PyObject*> new_node
cur_last=<Node>self.last
cur_last.prev_xor_next=cur_last.prev_xor_next^(<uintptr_t> new_ptr)
Py_XINCREF(new_ptr)
self.last=new_ptr
Py_XINCREF(new_ptr)
cpdef reverse(self):
temp=self.last
self.last=self.first
self.first=temp
def __repr__(self):
return str(list(iter_XORList(self)))
def __len__(self):
return self.length
def iter_XORList(l):
head=<PyObject*>l.head
cur=CurrentNode(<uintptr_t> head)
while cur:
yield cur
cur=cur.forward()
import time
start=time.time()
cdef XORList l=XORList()
for i in range(100000):
l.append(i)
print('time xor ', time.time()-start)
start=time.time()
l1=[]
for i in range(100000):
l1.append(i)
print('time regular ', time.time()-start)
using the builtin list above I consistently get ~10x worse performance on the cython linked list.使用上面的内置列表,我在 cython 链表上的性能总是差 10 倍左右。
time xor 0.10768294334411621
time regular 0.010972023010253906
When I profile the loop for the xorlist I get:当我分析 xorlist 的循环时,我得到:
700003 function calls in 1.184 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.184 1.184 <string>:1(<module>)
1 0.039 0.039 1.184 1.184 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:108(list_check)
100000 0.025 0.000 0.025 0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:11(__init__)
99999 0.019 0.000 0.019 0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:16(__get__)
99999 0.018 0.000 0.018 0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:19(__set__)
1 0.000 0.000 0.000 0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:60(__init__)
100000 0.937 0.000 0.999 0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:70(append)
100000 0.113 0.000 1.146 0.000 line_profiler.py:111(wrapper)
1 0.000 0.000 1.184 1.184 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100000 0.018 0.000 0.018 0.000 {method 'disable_by_count' of '_line_profiler.LineProfiler' objects}
100000 0.015 0.000 0.015 0.000 {method 'enable_by_count' of '_line_profiler.LineProfiler' objects}
So, ignoring the calls to append
it seems most of the time is spent in the special methods.因此,忽略对
append
的调用,似乎大部分时间都花在了特殊方法上。
This brings me to my questions:这让我想到了我的问题:
I also tried another custom implementation of an oridnary doubly-linked list in pure python and the timings of it and the cython xorlist are similar within 10% difference on my machine.我还在纯 python 中尝试了另一种普通双向链表的自定义实现,它和 cython xorlist 的时序在我的机器上相似,相差 10%。
The three culprits from your profiling look to be Node's __init__
(which is unavoidable here), and __get__
and __set__
for the prev_xor_next
property.分析中的三个罪魁祸首看起来是 Node 的
__init__
(这在这里是不可避免的),以及__get__
属性的prev_xor_next
和__set__
。 My view is that you don't want the prev_xor_next
property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.我的观点是你不想要
prev_xor_next
属性(或者如果你这样做,它应该是只读的),因为它使 Python 中的 Cython 内部可以访问。
Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next
.无论您是否删除该属性,您都在 Cython 中工作,因此您可以直接写入底层 C 属性
_prev_xor_next
。 You may need to set cdef Node cur_last
at the start of append
(and maybe in other functions) to ensure that Cython knows the type of cur_last
- I think it should be able to work it out but if you get AttributeErrors
at runtime then this is what you need to do.您可能需要在 append 的开头设置
cdef Node cur_last
append
也许在其他函数中)以确保 Cython 知道cur_last
的类型 - 我认为它应该能够解决它但是如果你在运行时得到AttributeErrors
那么这是你需要做什么。
This change gives me a 30% speed increase (ie it's still slower than a regular list, but it's a noticeable improvement).这种变化使我的速度提高了 30%(即它仍然比常规列表慢,但这是一个显着的改进)。
I'll outline a more drastic change that I possibly should have suggested on your first question about this problem.我将概述一个更剧烈的变化,我可能应该就你关于这个问题的第一个问题提出建议。 This really is a vague outline so no effort has been made to get it to work...
这确实是一个模糊的轮廓,所以没有努力让它发挥作用......
Node
is entirely internal to your XORList
class: it should not be used from Python and the lifetime of all the Nodes
in an XORList
is tied directly to the list. Node
完全在您的XORList
class 内部:它不应该在 Python 中使用,并且XORList
中所有Nodes
的生命周期直接与列表相关联。 Therefore they should be destructed on the destruction of their owning XORList
(or if the list shrinks, etc) and so do not need to be reference counted.因此,它们应该在销毁它们拥有的
XORList
时被销毁(或者如果列表缩小等),因此不需要进行引用计数。 Therefore Node
should be a C struct rather than a Python object:因此
Node
应该是 C 结构而不是 Python object:
cdef struct Node: uintptr_t prev_xor_next PyObject* val # with associated constructor- and destructor-like functions: cdef Node* make_node(object val, uintptr_t prev_xor_next): cdef Node* n = <Node*>malloc(sizeof(Node)) n.val = <PyObject*>val Py_XINCREF(n.val) n.prev_xor_next = prev_xor_next return n cdef void destroy_node(Node* n): Py_XDECREF(n.val) free(n)
XORList
needs a __dealloc__
function that loops through the list calling destroy_node
on each Node
(it needs a __dealloc__
function anyway in your version too!) XORList
需要一个__dealloc__
function 循环遍历每个Node
上调用destroy_node
的列表(它也需要一个__dealloc__
function 在您的版本中!)
CurrentNode
needs to remain a Cython class, since this is your "iterator" interface. CurrentNode
需要保留 Cython class,因为这是您的“迭代器”接口。 It can obviously no longer inherit from Node
.它显然不能再从
Node
继承。 I'd change it to:我将其更改为:
cdef class XORListIterator: cdef Node* current_node cdef XORList our_list
the point of the attribute our_list
is to ensure that the XORList
is kept alive at least as long as the CurrentNode
- if you end up with an iterator for an XORList
that no longer exists that the current_node
attribute will be invalid. our_list
属性的重点是确保XORList
至少与CurrentNode
一样长 - 如果您最终得到一个不再存在的XORList
的迭代器,则current_node
属性将无效。 current_node
is not owned by XORListIterator
so no need for a destructor. current_node
不属于XORListIterator
,因此不需要析构函数。
The danger with this scheme I think is making sure that if any changes to the XORList
don't completely invalidate any existing XORListIterators
to the point where you get crashes.我认为这种方案的危险在于确保如果对
XORList
的任何更改都不会完全使任何现有的XORListIterators
失效,那么就会导致崩溃。 I suspect this would also be an issue with your current version.我怀疑这也是您当前版本的问题。
I suspect the built-in list
will still remain competitive, since it is a well-written, efficient structure.我怀疑内置
list
仍将保持竞争力,因为它是一个编写良好、高效的结构。 Remember that list.append
is usually a simple Py_INCREF
, with an occasional array reallocation and copy.请记住,
list.append
通常是一个简单的Py_INCREF
,偶尔会重新分配和复制数组。 Yours always involves creation of a new Python object (the Node
) as well as some associated reference counting.你的总是涉及创建一个新的 Python object (
Node
)以及一些相关的引用计数。
My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer.我的替代方案避免了很多引用计数(在计算时间和“你必须考虑它”的时间方面),所以我希望它更接近。 It retain the disadvantage of a small memory allocation each
append
, which is unavoidable for a linked-list structure.它保留了每个
append
分配一个小的 memory 的缺点,这对于链表结构是不可避免的。
Addendum : to address the comment about "the convenience of a Cython class".附录:解决关于“Cython 类的便利性”的评论。 In my view the two advantages of using a Cython class vs a struct are:
在我看来,使用 Cython class 与结构的两个优点是:
XORList
that shouldn't be exposed to Python users.XORList
的实现细节,不应该暴露给 Python 用户。 Therefore I think the main reasons to use Cython classes specifically don't apply to your problem.因此,我认为使用 Cython 类的主要原因不适用于您的问题。 (For a lot of code the advantages do apply, of course!)
(当然,对于很多代码来说,优势确实适用!)
It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect".可能还值得补充的是,构建 Cython 类可能是它们速度较慢的特性之一——为了支持可能的 inheritance,构建过程相当“间接”。 You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.
您已经设法创建了一个几乎所有构建的基准 - 我猜它是一个稍微倾斜的基准,实际情况可能没有那么糟糕。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.