简体   繁体   English

了解使用Cython扩展类型与Python类的好处

[英]Understanding the benefit of using Cython Extension types vs Python Classes

I am very new to Cython and after skimming through the Cython documentation I came across Cython Extention types . 我对Cython并不陌生,浏览了Cython文档后,我遇到了Cython Extension 类型 I am wondering what benefit does it give over normal Python Classes? 我想知道它比普通的Python类有什么好处? I was trying to convert my previous Python class which had lists in its data members to a Cython extension type but it seems like I cannot declare lists as data members in extension type. 我试图将我的先前的Python类(其数据成员中具有列表)转换为Cython扩展类型,但似乎无法将列表声明为扩展类型中的数据成员。 Only thing that I can do is convert a Python class which uses data-members of primitive C datatypes to Cython extension type. 我唯一能做的就是将使用原始C数据类型的数据成员的Python类转换为Cython扩展类型。

I have a segment of my code in Python class which I need to optimize using Cython. 我在Python类中有一段代码,需要使用Cython进行优化。 How can I Cythonize functions related to that segment without declaring the class as a Cython extension type? 如何在不将类声明为Cython扩展类型的情况下Cythonize与该段相关的功能? (Basically I want to declare only certain functions as cdef, not all.) (基本上,我只想将某些函数声明为cdef,而不是全部。)

I have not used those cdef ed classes when writing code in Cython, so I'll leave that part on benefits of cdef ed types for someone else to answer. 在Cython中编写代码时,我没有使用过这些cdef ed类,因此我将把这部分放在cdef ed类型的好处上,以供其他人回答。

On the point of lists, while you can still use Python lists as normal, you will need to copy each member explicitly into a C array to fully take advantage of the speed-ups from writing in C. However, if the data is stored in a NumPy array, you can just store a pointer to the start of the array (making sure that the array is C-contiguous) instead. 在列表上,尽管您仍然可以照常使用Python列表,但是您需要将每个成员显式复制到C数组中,以充分利用编写C语言带来的提速。但是,如果数据存储在一个NumPy数组,您可以只存储一个指向数组开头的指针(确保该数组是C连续的)。 The full equivalence table of NumPy and C types can be found here: https://github.com/cython/cython/blob/master/Cython/Includes/numpy/__init__.pxd 可以在以下位置找到NumPy和C类型的完整等效表: https : //github.com/cython/cython/blob/master/Cython/Includes/numpy/__init__.pxd

As for cythonising class methods, since most Python code can work as is without modification in Cython, you can define your class in Cython using class SomeClass: as normal. 至于cythonising类方法,由于大多数Python代码可以在Cython中进行修改而无需修改,因此您可以像平常一样使用class SomeClass:类在Cython中定义您的类。 Since cdef functions can only be called from within Cython, you would likely want to define cythonised methods outside the class (preferably typed to improve performance). 由于只能在Cython中调用cdef函数,因此您可能想在类之外定义cythonized方法(最好键入以提高性能)。 Inside the class, you can use the regular def (which can be called from Python) to call their cythonised counterpart. 在类内部,您可以使用常规def (可以从Python调用)来调用其cythonized对应对象。

For larger mostly Python classes which you would not like to move into Cython, you can use a similar method but only have def ed functions in Cython calling the cdef ed version. 对于您不希望移入Cython的较大的大多数Python类,可以使用类似的方法,但在Cython中仅具有def函数调用cdef ed版本。 You then call the Cython functions from Python as you would normally do when importing modules. 然后,您可以像导入模块时通常那样从Python调用Cython函数。

For data structures that only needs to reside in C and not interact (much) with Python, you may also want to consider using PyCapsule to store them as your class attributes. 对于只需要驻留在C中并且不需要与Python交互(很多)的数据结构,您可能还需要考虑使用PyCapsule将它们存储为类属性。

Edit: 编辑:

From reading the comment under chrisb 's answer, I gather that you want to have a 2D array with variable row length. 通过阅读chrisb回答下的评论,我发现您希望拥有一个具有可变行长的2D数组。 Before you dive straight into implementing the exact same data structure in C, it is worth noting that C is not Python. 在直接研究用C实现完全相同的数据结构之前,值得注意的是C不是Python。 It will not automatically manage the length of lists for you, instead you will have to manage the memory yourself (see the example in the first link). 它不会自动为您管理列表的长度,而是您必须自己管理内存(请参阅第一个链接中的示例)。 While this is the standard way of dynamically allocating memory, programmers new to C (and Cython) tend not to want to touch " malloc and friends" since pointers will be flying around. 尽管这是动态分配内存的标准方法,但是C(和Cython)新手程序员往往不希望接触“ malloc和朋友”,因为指针会四处飞散。 On top of that, C arrays generally do not have mixed data type, eg you cannot have both numbers and strings in the same array (if you really need to, there is a way to do that). 最重要的是,C数组通常不具有混合数据类型,例如,不能在同一数组中同时包含数字和字符串(如果确实需要,可以采用一种方法)。

In light of this, you may wish to rethink your data structure. 鉴于此,您可能希望重新考虑您的数据结构。 For example, you could consider way to represent the data with arrays of constant length. 例如,您可以考虑用固定长度的数组表示数据的方法。 If your array has a maximum width, you may be able to exchange memory with ease of programming by using a NumPy array. 如果您的数组具有最大宽度,则可以使用NumPy数组轻松交换内存。

If you are happy to give manual memory management a go, here is a simple example of allocating memory for a 2D array of int s: 如果您乐于进行手动内存管理,下面是一个为int的二维数组分配内存的简单示例:

from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free

cdef int **generate_2D_array(int rows, int columns):
    cdef int row
    cdef int **parent = <int **>PyMem_Malloc(rows * sizeof(int*))
    if not parent:
        raise MemoryError()
    for row in range(rows):
        parent[row] = <int *>PyMem_Malloc(columns * sizeof(int))
        if not parent[row]:
            raise MemoryError()
    return parent

To change the length of a row, you can use: 要更改行的长度,可以使用:

cdef void resize_row(int *row_pointer, int new_size):
    PyMem_Realloc(row_pointer, new_size)

When you are done with the data, remember to deallocate the memory using PyMem_Free in a similar fashion to allocation with PyMem_Malloc . 当您使用的数据完成的,记得要解除分配使用的内存PyMem_Free以类似的方式来分配与PyMem_Malloc The rule of thumb is: for every PyMem_Malloc you use, free the memory using exactly one PyMem_Free , no more, no less. 经验法则是:对于您使用的每个PyMem_Malloc ,仅使用一个PyMem_Free释放内存,不要多也不少。 And finally, just a word of warning, failure to use these appropriately may cause segmentation faults, memory leaks or undefined behaviour. 最后,仅需警告,未能正确使用它们可能会导致分段错误,内存泄漏或不确定的行为。

For python-heavy operations, a cdef class will still have some performance benefits, due to optimized field access and python overhead removal. 对于python繁重的操作,由于优化的字段访问和python开销消除,cdef类仍将具有一些性能优势。 Simple example: 简单的例子:

class PyThing:
    def __init__(self, data):
        self.data = data
    def calc(self):
        ans = 0
        for v in self.data:
            ans += v
        return ans


py = PyThing(list(range(100000)))

%timeit py.calc()
4.07 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


import cython
%load_ext cython

%%cython
cdef class CyThing:
    cdef list data
    def __init__(self, data):
        self.data = data
    def calc(self):
        ans = 0
        for v in self.data:
            ans += v
        return ans

cy = CyThing(list(range(100000)))

%timeit cy.calc()
2.8 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is at the cost of simplicity / debug-ability, so may or may not be worth it. 这是以简单性/调试能力为代价的,因此可能值得也可能不值得。 A better usecase is when you have some c-level values you want to store and use, there the performance benefits can be much more significant. 一个更好的用例是,当您有一些要存储和使用的C级值时,性能收益将更为显着。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM