简体   繁体   English

Cython 实现不比纯 python 快

[英]Cython implementation no faster than pure python

For an exercise I've written a XOR doubly-linked list对于一个练习,我写了一个 XOR 双向链表

%%cython

from cpython.object cimport PyObject
from cpython.ref cimport Py_XINCREF, Py_XDECREF
from libc.stdint cimport uintptr_t

cdef class Node:
    cdef uintptr_t _prev_xor_next
    cdef object val

    def __init__(self, object val, uintptr_t prev_xor_next=0):
        self._prev_xor_next=prev_xor_next
        self.val=val

    @property
    def prev_xor_next(self):
        return self._prev_xor_next
    @prev_xor_next.setter
    def prev_xor_next(self, uintptr_t p):
        self._prev_xor_next=p

    def __repr__(self):
        return str(self.val)


cdef class CurrentNode(Node):
    cdef uintptr_t _node, _prev_ptr
    def __init__(self, uintptr_t node, uintptr_t prev_ptr=0):
        self._node = node
        self._prev_ptr= prev_ptr

    @property
    def val(self):
        return self.node.val
    @property
    def node(self):
        ret=<PyObject *> self._node
        return <Node> ret
    @property
    def prev_ptr(self):
        return self._prev_ptr

    cdef CurrentNode forward(self):
        if self.node.prev_xor_next!=self._prev_ptr:
            return CurrentNode(self.node.prev_xor_next^self._prev_ptr, self._node)

    cdef CurrentNode backward(self):
        if self._prev_ptr:
            pp=<PyObject*>self._prev_ptr
            return CurrentNode(self._prev_ptr, self._node^(<Node> pp).prev_xor_next)

    def __repr__(self):
        return str(self.node)

cdef class XORList:
    cdef PyObject* first
    cdef PyObject* last
    cdef int length

    def __init__(self):
        self.length=0
    @property
    def head(self):
        return (<Node> self.first)

    @property
    def tail(self):
        return (<Node> self.last)

    cdef append(self, object val):
        self.length+=1
        #empty list
        if not self.first:
            t=Node(val)
            tp=(<PyObject*> t)
            self.first=tp
            Py_XINCREF(tp)
            self.last=tp
            Py_XINCREF(tp)

        #not empty
        else:
            new_node=Node(val, <uintptr_t> self.last)
            new_ptr=<PyObject*> new_node
            cur_last=<Node>self.last
            cur_last.prev_xor_next=cur_last.prev_xor_next^(<uintptr_t> new_ptr)
            Py_XINCREF(new_ptr)
            self.last=new_ptr
            Py_XINCREF(new_ptr)

    cpdef reverse(self):
        temp=self.last
        self.last=self.first
        self.first=temp

    def __repr__(self):
        return str(list(iter_XORList(self)))
    def __len__(self):
        return self.length

def iter_XORList(l):
    head=<PyObject*>l.head
    cur=CurrentNode(<uintptr_t> head)
    while cur:
        yield cur
        cur=cur.forward()

import time

start=time.time()
cdef XORList l=XORList()
for i in range(100000):
    l.append(i)
print('time xor ', time.time()-start)

start=time.time()
l1=[]
for i in range(100000):
    l1.append(i)
print('time regular ', time.time()-start)

using the builtin list above I consistently get ~10x worse performance on the cython linked list.使用上面的内置列表,我在 cython 链表上的性能总是差 10 倍左右。

time xor  0.10768294334411621
time regular  0.010972023010253906

When I profile the loop for the xorlist I get:当我分析 xorlist 的循环时,我得到:

         700003 function calls in 1.184 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.184    1.184 <string>:1(<module>)
        1    0.039    0.039    1.184    1.184 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:108(list_check)
   100000    0.025    0.000    0.025    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:11(__init__)
    99999    0.019    0.000    0.019    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:16(__get__)
    99999    0.018    0.000    0.018    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:19(__set__)
        1    0.000    0.000    0.000    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:60(__init__)
   100000    0.937    0.000    0.999    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:70(append)
   100000    0.113    0.000    1.146    0.000 line_profiler.py:111(wrapper)
        1    0.000    0.000    1.184    1.184 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    0.018    0.000    0.018    0.000 {method 'disable_by_count' of '_line_profiler.LineProfiler' objects}
   100000    0.015    0.000    0.015    0.000 {method 'enable_by_count' of '_line_profiler.LineProfiler' objects}

So, ignoring the calls to append it seems most of the time is spent in the special methods.因此,忽略对append的调用,似乎大部分时间都花在了特殊方法上。

This brings me to my questions:这让我想到了我的问题:

  1. how can I speed this up我怎样才能加快速度
  2. I thought extension types in Cython are implemented underneath via structs so what is causing the initializations of them to take so long我认为 Cython 中的扩展类型是通过结构在下面实现的,所以是什么导致它们的初始化需要这么长时间

I also tried another custom implementation of an oridnary doubly-linked list in pure python and the timings of it and the cython xorlist are similar within 10% difference on my machine.我还在纯 python 中尝试了另一种普通双向链表的自定义实现,它和 cython xorlist 的时序在我的机器上相似,相差 10%。

The three culprits from your profiling look to be Node's __init__ (which is unavoidable here), and __get__ and __set__ for the prev_xor_next property.分析中的三个罪魁祸首看起来是 Node 的__init__ (这在这里是不可避免的),以及__get__属性的prev_xor_next__set__ My view is that you don't want the prev_xor_next property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.我的观点是你不想要prev_xor_next属性(或者如果你这样做,它应该是只读的),因为它使 Python 中的 Cython 内部可以访问。

Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next .无论您是否删除该属性,您都在 Cython 中工作,因此您可以直接写入底层 C 属性_prev_xor_next You may need to set cdef Node cur_last at the start of append (and maybe in other functions) to ensure that Cython knows the type of cur_last - I think it should be able to work it out but if you get AttributeErrors at runtime then this is what you need to do.您可能需要在 append 的开头设置cdef Node cur_last append也许在其他函数中)以确保 Cython 知道cur_last的类型 - 我认为它应该能够解决它但是如果你在运行时得到AttributeErrors那么这是你需要做什么。

This change gives me a 30% speed increase (ie it's still slower than a regular list, but it's a noticeable improvement).这种变化使我的速度提高了 30%(即它仍然比常规列表慢,但这是一个显着的改进)。


I'll outline a more drastic change that I possibly should have suggested on your first question about this problem.我将概述一个更剧烈的变化,我可能应该就你关于这个问题的第一个问题提出建议。 This really is a vague outline so no effort has been made to get it to work...这确实是一个模糊的轮廓,所以没有努力让它发挥作用......

  • Node is entirely internal to your XORList class: it should not be used from Python and the lifetime of all the Nodes in an XORList is tied directly to the list. Node完全在您的XORList class 内部:它不应该在 Python 中使用,并且XORList中所有Nodes的生命周期直接与列表相关联。 Therefore they should be destructed on the destruction of their owning XORList (or if the list shrinks, etc) and so do not need to be reference counted.因此,它们应该在销毁它们拥有的XORList时被销毁(或者如果列表缩小等),因此不需要进行引用计数。 Therefore Node should be a C struct rather than a Python object:因此Node应该是 C 结构而不是 Python object:

     cdef struct Node: uintptr_t prev_xor_next PyObject* val # with associated constructor- and destructor-like functions: cdef Node* make_node(object val, uintptr_t prev_xor_next): cdef Node* n = <Node*>malloc(sizeof(Node)) n.val = <PyObject*>val Py_XINCREF(n.val) n.prev_xor_next = prev_xor_next return n cdef void destroy_node(Node* n): Py_XDECREF(n.val) free(n)
  • XORList needs a __dealloc__ function that loops through the list calling destroy_node on each Node (it needs a __dealloc__ function anyway in your version too!) XORList需要一个__dealloc__ function 循环遍历每个Node上调用destroy_node的列表(它也需要一个__dealloc__ function 在您的版本中!)

  • CurrentNode needs to remain a Cython class, since this is your "iterator" interface. CurrentNode需要保留 Cython class,因为这是您的“迭代器”接口。 It can obviously no longer inherit from Node .它显然不能再从Node继承。 I'd change it to:我将其更改为:

     cdef class XORListIterator: cdef Node* current_node cdef XORList our_list

    the point of the attribute our_list is to ensure that the XORList is kept alive at least as long as the CurrentNode - if you end up with an iterator for an XORList that no longer exists that the current_node attribute will be invalid. our_list属性的重点是确保XORList至少与CurrentNode一样长 - 如果您最终得到一个不再存在的XORList的迭代器,则current_node属性将无效。 current_node is not owned by XORListIterator so no need for a destructor. current_node不属于XORListIterator ,因此不需要析构函数。

The danger with this scheme I think is making sure that if any changes to the XORList don't completely invalidate any existing XORListIterators to the point where you get crashes.我认为这种方案的危险在于确保如果对XORList的任何更改都不会完全使任何现有的XORListIterators失效,那么就会导致崩溃。 I suspect this would also be an issue with your current version.我怀疑这也是您当前版本的问题。


I suspect the built-in list will still remain competitive, since it is a well-written, efficient structure.我怀疑内置list仍将保持竞争力,因为它是一个编写良好、高效的结构。 Remember that list.append is usually a simple Py_INCREF , with an occasional array reallocation and copy.请记住, list.append通常是一个简单的Py_INCREF ,偶尔会重新分配和复制数组。 Yours always involves creation of a new Python object (the Node ) as well as some associated reference counting.你的总是涉及创建一个新的 Python object ( Node )以及一些相关的引用计数。

My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer.我的替代方案避免了很多引用计数(在计算时间和“你必须考虑它”的时间方面),所以我希望它更接近。 It retain the disadvantage of a small memory allocation each append , which is unavoidable for a linked-list structure.它保留了每个append分配一个小的 memory 的缺点,这对于链表结构是不可避免的。


Addendum : to address the comment about "the convenience of a Cython class".附录:解决关于“Cython 类的便利性”的评论。 In my view the two advantages of using a Cython class vs a struct are:在我看来,使用 Cython class 与结构的两个优点是:

  1. You get something fairly close to a struct, but don't have to worry about C pointers and the reference counting is taken care of.你得到的东西相当接近结构,但不必担心 C 指针,并且引用计数得到了处理。 It's pretty clear that for this problem you're doing odd things to pointers and having to handle reference counting explicitly, so I don't think this is applies to you.很明显,对于这个问题,你对指针做了奇怪的事情,并且必须明确地处理引用计数,所以我认为这不适用于你。
  2. You can use it from Python - you aren't just restricted to Cython.您可以从 Python 使用它 - 您不仅限于 Cython。 In this case I think it's entirely an implementation detail of the XORList that shouldn't be exposed to Python users.在这种情况下,我认为这完全是XORList的实现细节,不应该暴露给 Python 用户。

Therefore I think the main reasons to use Cython classes specifically don't apply to your problem.因此,我认为使用 Cython 类的主要原因不适用于您的问题。 (For a lot of code the advantages do apply, of course!) (当然,对于很多代码来说,优势确实适用!)

It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect".可能还值得补充的是,构建 Cython 类可能是它们速度较慢的特性之一——为了支持可能的 inheritance,构建过程相当“间接”。 You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.您已经设法创建了一个几乎所有构建的基准 - 我猜它是一个稍微倾斜的基准,实际情况可能没有那么糟糕。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM