Cython 实现不比纯 python 快

Question

For an exercise I've written a XOR doubly-linked list对于一个练习，我写了一个 XOR 双向链表

%%cython

from cpython.object cimport PyObject
from cpython.ref cimport Py_XINCREF, Py_XDECREF
from libc.stdint cimport uintptr_t

cdef class Node:
    cdef uintptr_t _prev_xor_next
    cdef object val

    def __init__(self, object val, uintptr_t prev_xor_next=0):
        self._prev_xor_next=prev_xor_next
        self.val=val

    @property
    def prev_xor_next(self):
        return self._prev_xor_next
    @prev_xor_next.setter
    def prev_xor_next(self, uintptr_t p):
        self._prev_xor_next=p

    def __repr__(self):
        return str(self.val)


cdef class CurrentNode(Node):
    cdef uintptr_t _node, _prev_ptr
    def __init__(self, uintptr_t node, uintptr_t prev_ptr=0):
        self._node = node
        self._prev_ptr= prev_ptr

    @property
    def val(self):
        return self.node.val
    @property
    def node(self):
        ret=<PyObject *> self._node
        return <Node> ret
    @property
    def prev_ptr(self):
        return self._prev_ptr

    cdef CurrentNode forward(self):
        if self.node.prev_xor_next!=self._prev_ptr:
            return CurrentNode(self.node.prev_xor_next^self._prev_ptr, self._node)

    cdef CurrentNode backward(self):
        if self._prev_ptr:
            pp=<PyObject*>self._prev_ptr
            return CurrentNode(self._prev_ptr, self._node^(<Node> pp).prev_xor_next)

    def __repr__(self):
        return str(self.node)

cdef class XORList:
    cdef PyObject* first
    cdef PyObject* last
    cdef int length

    def __init__(self):
        self.length=0
    @property
    def head(self):
        return (<Node> self.first)

    @property
    def tail(self):
        return (<Node> self.last)

    cdef append(self, object val):
        self.length+=1
        #empty list
        if not self.first:
            t=Node(val)
            tp=(<PyObject*> t)
            self.first=tp
            Py_XINCREF(tp)
            self.last=tp
            Py_XINCREF(tp)

        #not empty
        else:
            new_node=Node(val, <uintptr_t> self.last)
            new_ptr=<PyObject*> new_node
            cur_last=<Node>self.last
            cur_last.prev_xor_next=cur_last.prev_xor_next^(<uintptr_t> new_ptr)
            Py_XINCREF(new_ptr)
            self.last=new_ptr
            Py_XINCREF(new_ptr)

    cpdef reverse(self):
        temp=self.last
        self.last=self.first
        self.first=temp

    def __repr__(self):
        return str(list(iter_XORList(self)))
    def __len__(self):
        return self.length

def iter_XORList(l):
    head=<PyObject*>l.head
    cur=CurrentNode(<uintptr_t> head)
    while cur:
        yield cur
        cur=cur.forward()

import time

start=time.time()
cdef XORList l=XORList()
for i in range(100000):
    l.append(i)
print('time xor ', time.time()-start)

start=time.time()
l1=[]
for i in range(100000):
    l1.append(i)
print('time regular ', time.time()-start)

using the builtin list above I consistently get ~10x worse performance on the cython linked list.使用上面的内置列表，我在 cython 链表上的性能总是差 10 倍左右。

time xor  0.10768294334411621
time regular  0.010972023010253906

When I profile the loop for the xorlist I get:当我分析 xorlist 的循环时，我得到：

         700003 function calls in 1.184 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.184    1.184 <string>:1(<module>)
        1    0.039    0.039    1.184    1.184 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:108(list_check)
   100000    0.025    0.000    0.025    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:11(__init__)
    99999    0.019    0.000    0.019    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:16(__get__)
    99999    0.018    0.000    0.018    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:19(__set__)
        1    0.000    0.000    0.000    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:60(__init__)
   100000    0.937    0.000    0.999    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:70(append)
   100000    0.113    0.000    1.146    0.000 line_profiler.py:111(wrapper)
        1    0.000    0.000    1.184    1.184 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    0.018    0.000    0.018    0.000 {method 'disable_by_count' of '_line_profiler.LineProfiler' objects}
   100000    0.015    0.000    0.015    0.000 {method 'enable_by_count' of '_line_profiler.LineProfiler' objects}

So, ignoring the calls to append it seems most of the time is spent in the special methods.因此，忽略对append的调用，似乎大部分时间都花在了特殊方法上。

This brings me to my questions:这让我想到了我的问题：

how can I speed this up我怎样才能加快速度
I thought extension types in Cython are implemented underneath via structs so what is causing the initializations of them to take so long我认为 Cython 中的扩展类型是通过结构在下面实现的，所以是什么导致它们的初始化需要这么长时间

I also tried another custom implementation of an oridnary doubly-linked list in pure python and the timings of it and the cython xorlist are similar within 10% difference on my machine.我还在纯 python 中尝试了另一种普通双向链表的自定义实现，它和 cython xorlist 的时序在我的机器上相似，相差 10%。

Answer 1

The three culprits from your profiling look to be Node's __init__ (which is unavoidable here), and __get__ and __set__ for the prev_xor_next property.分析中的三个罪魁祸首看起来是 Node 的__init__ （这在这里是不可避免的），以及__get__属性的prev_xor_next和__set__ 。 My view is that you don't want the prev_xor_next property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.我的观点是你不想要prev_xor_next属性（或者如果你这样做，它应该是只读的），因为它使 Python 中的 Cython 内部可以访问。

Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next .无论您是否删除该属性，您都在 Cython 中工作，因此您可以直接写入底层 C 属性_prev_xor_next 。 You may need to set cdef Node cur_last at the start of append (and maybe in other functions) to ensure that Cython knows the type of cur_last - I think it should be able to work it out but if you get AttributeErrors at runtime then this is what you need to do.您可能需要在 append 的开头设置cdef Node cur_last append也许在其他函数中）以确保 Cython 知道cur_last的类型 - 我认为它应该能够解决它但是如果你在运行时得到AttributeErrors那么这是你需要做什么。

This change gives me a 30% speed increase (ie it's still slower than a regular list, but it's a noticeable improvement).这种变化使我的速度提高了 30%（即它仍然比常规列表慢，但这是一个显着的改进）。

I'll outline a more drastic change that I possibly should have suggested on your first question about this problem.我将概述一个更剧烈的变化，我可能应该就你关于这个问题的第一个问题提出建议。 This really is a vague outline so no effort has been made to get it to work...这确实是一个模糊的轮廓，所以没有努力让它发挥作用......

Node is entirely internal to your XORList class: it should not be used from Python and the lifetime of all the Nodes in an XORList is tied directly to the list. Node完全在您的XORList class 内部：它不应该在 Python 中使用，并且XORList中所有Nodes的生命周期直接与列表相关联。 Therefore they should be destructed on the destruction of their owning XORList (or if the list shrinks, etc) and so do not need to be reference counted.因此，它们应该在销毁它们拥有的XORList时被销毁（或者如果列表缩小等），因此不需要进行引用计数。 Therefore Node should be a C struct rather than a Python object:因此Node应该是 C 结构而不是 Python object：
```
 cdef struct Node: uintptr_t prev_xor_next PyObject* val # with associated constructor- and destructor-like functions: cdef Node* make_node(object val, uintptr_t prev_xor_next): cdef Node* n = <Node*>malloc(sizeof(Node)) n.val = <PyObject*>val Py_XINCREF(n.val) n.prev_xor_next = prev_xor_next return n cdef void destroy_node(Node* n): Py_XDECREF(n.val) free(n)
```
XORList needs a __dealloc__ function that loops through the list calling destroy_node on each Node (it needs a __dealloc__ function anyway in your version too!) XORList需要一个__dealloc__ function 循环遍历每个Node上调用destroy_node的列表（它也需要一个__dealloc__ function 在您的版本中！）
CurrentNode needs to remain a Cython class, since this is your "iterator" interface. CurrentNode需要保留 Cython class，因为这是您的“迭代器”接口。 It can obviously no longer inherit from Node .它显然不能再从Node继承。 I'd change it to:我将其更改为：
```
 cdef class XORListIterator: cdef Node* current_node cdef XORList our_list
```
the point of the attribute our_list is to ensure that the XORList is kept alive at least as long as the CurrentNode - if you end up with an iterator for an XORList that no longer exists that the current_node attribute will be invalid. our_list属性的重点是确保XORList至少与CurrentNode一样长 - 如果您最终得到一个不再存在的XORList的迭代器，则current_node属性将无效。 current_node is not owned by XORListIterator so no need for a destructor. current_node不属于XORListIterator ，因此不需要析构函数。

The danger with this scheme I think is making sure that if any changes to the XORList don't completely invalidate any existing XORListIterators to the point where you get crashes.我认为这种方案的危险在于确保如果对XORList的任何更改都不会完全使任何现有的XORListIterators失效，那么就会导致崩溃。 I suspect this would also be an issue with your current version.我怀疑这也是您当前版本的问题。

I suspect the built-in list will still remain competitive, since it is a well-written, efficient structure.我怀疑内置list仍将保持竞争力，因为它是一个编写良好、高效的结构。 Remember that list.append is usually a simple Py_INCREF , with an occasional array reallocation and copy.请记住， list.append通常是一个简单的Py_INCREF ，偶尔会重新分配和复制数组。 Yours always involves creation of a new Python object (the Node ) as well as some associated reference counting.你的总是涉及创建一个新的 Python object （ Node ）以及一些相关的引用计数。

My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer.我的替代方案避免了很多引用计数（在计算时间和“你必须考虑它”的时间方面），所以我希望它更接近。 It retain the disadvantage of a small memory allocation each append , which is unavoidable for a linked-list structure.它保留了每个append分配一个小的 memory 的缺点，这对于链表结构是不可避免的。

Addendum : to address the comment about "the convenience of a Cython class".附录：解决关于“Cython 类的便利性”的评论。 In my view the two advantages of using a Cython class vs a struct are:在我看来，使用 Cython class 与结构的两个优点是：

You get something fairly close to a struct, but don't have to worry about C pointers and the reference counting is taken care of.你得到的东西相当接近结构，但不必担心 C 指针，并且引用计数得到了处理。 It's pretty clear that for this problem you're doing odd things to pointers and having to handle reference counting explicitly, so I don't think this is applies to you.很明显，对于这个问题，你对指针做了奇怪的事情，并且必须明确地处理引用计数，所以我认为这不适用于你。
You can use it from Python - you aren't just restricted to Cython.您可以从 Python 使用它 - 您不仅限于 Cython。 In this case I think it's entirely an implementation detail of the XORList that shouldn't be exposed to Python users.在这种情况下，我认为这完全是XORList的实现细节，不应该暴露给 Python 用户。

Therefore I think the main reasons to use Cython classes specifically don't apply to your problem.因此，我认为使用 Cython 类的主要原因不适用于您的问题。 (For a lot of code the advantages do apply, of course!) （当然，对于很多代码来说，优势确实适用！）

It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect".可能还值得补充的是，构建 Cython 类可能是它们速度较慢的特性之一——为了支持可能的 inheritance，构建过程相当“间接”。 You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.您已经设法创建了一个几乎所有构建的基准 - 我猜它是一个稍微倾斜的基准，实际情况可能没有那么糟糕。

Cython 实现不比纯 python 快

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-30 08:40:12

Cython 实现不比纯 python 快

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-30 08:40:12

解决方案1
1 已采纳 2019-10-30 08:40:12