简体   繁体   English

Cython:在没有NumPy数组的情况下创建内存视图?

[英]Cython: Create memoryview without NumPy array?

Since I found memory-views handy and fast, I try to avoid creating NumPy arrays in cython and work with the views of the given arrays. 由于我发现内存视图方便快捷,我尝试避免在cython中创建NumPy数组并使用给定数组的视图。 However, sometimes it cannot be avoided, not to alter an existing array but create a new one. 但是,有时无法避免,不能改变现有阵列而是创建新阵列。 In upper functions this is not noticeable, but in often called subroutines it is. 在上层函数中,这是不明显的,但在经常被称为子例程的情况下。 Consider the following function 考虑以下功能

#@cython.profile(False)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cdef double [:] vec_eq(double [:] v1, int [:] v2, int cond):
    ''' Function output corresponds to v1[v2 == cond]'''
    cdef unsigned int n = v1.shape[0]
    cdef unsigned int n_ = 0
    # Size of array to create
    cdef size_t i
    for i in range(n):
        if v2[i] == cond:
            n_ += 1
    # Create array for selection
    cdef double [:] s = np.empty(n_, dtype=np_float) # Slow line
    # Copy selection to new array
    n_ = 0
    for i in range(n):
        if v2[i] == cond:
            s[n_] = v1[i]
            n_ += 1
    return s

Profiling tells me, there is some speed to gain here 剖析告诉我,这里有一定的速度

What I could do is adapting the function, cause sometimes, for instance the mean of this vector is calculated, sometimes the sum. 我能做的是调整函数,有时候会导致例如计算这个向量的平均值,有时是总和。 So I could rewrite it, for summing or taking average. 所以我可以重写它,用于求和或取平均值。 But isn't there a way to create memory-view with very little overhead directly, defining size dynamically . 但是没有办法直接创建具有非常小开销的内存视图,动态定义大小 Something like first creating ac buffer using malloc etc and at the end of the function convert the buffer to a view , passing the pointer and strides or so.. 首先使用malloc创建ac缓冲区并在函数末尾将缓冲区转换为视图 ,传递指针和步幅等等。

Edit 1: Maybe for simple cases, adapting the function eg like this is an acceptable approach. 编辑1:也许对于简单的情况,调整功能,例如这样是一种可接受的方法。 I only added an argument and summing/taking average. 我只添加了一个参数并总结/取平均值。 This way I dont have to create an array and can take an easy to handle inside function malloc. 这样我就不必创建一个数组,并且可以轻松处理内部函数malloc。 This won't get any faster, will it? 这不会更快,是吗?

# ...
cdef double vec_eq(double [:] v1, int [:] v2, int cond, opt=0):
    # additional option argument
    ''' Function output corresponds to v1[v2 == cond].sum() / .mean()'''
    cdef unsigned int n = v1.shape[0]
    cdef int n_ = 0
    # Size of array to create
    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        if v2[i] == cond:
            n_ += 1
    # Create array for selection
    cdef double s = 0
    cdef double * v3 = <double *> malloc(sizeof(double) * n_)
    if v3 == NULL:
        abort()
    # Copy selection to new array
    n_ = 0
    for i in range(n):
        if v2[i] == cond:
            v3[n_] = v1[i]
            n_ += 1
    # Do further computation here, according to option
    # Option 0 for the sum
    if opt == 0:
        for i in prange(n_, nogil=True):
            s += v3[i]
        free(v3)
        return s
    # Option 1 for the mean
    else:
        for i in prange(n_, nogil=True):
            s += v3[i]
        free(v3)
        return s / n_
    # Since in the end there is always only a single double value, 
    # the memory can be freed right here

Didn't know, how to deal with cpython arrays, so I solved this finally by a self made 'memory view', as proposed by fabrizioM . 不知道,如何处理cpython数组,所以我最终通过fabrizioM提出的自制“内存视图”解决了这个问题。 Wouldn't have thought that this would work. 不会想到这会起作用。 Creating a new np.array in a tight loop is pretty expensive, so this gave me a significant speed up. 在紧凑的循环中创建一个新的np.array非常昂贵,所以这给了我一个显着的加速。 Since I only need a 1 dimensional array, I didn't even had to bother with strides. 因为我只需要一维数组,所以我甚至不必费心。 But even for a higher dimensional arrays, I think this could go well. 但即使对于更高维度的阵列,我认为这可能会很顺利。

cdef class Vector:
    cdef double *data
    cdef public int n_ax0

    def __init__(Vector self, int n_ax0):
        self.data = <double*> malloc (sizeof(double) * n_ax0)
        self.n_ax0 = n_ax0

    def __dealloc__(Vector self):
        free(self.data)

...
#@cython.profile(False)
@cython.boundscheck(False)
cdef Vector my_vec_func(double [:, ::1] a, int [:] v, int cond, int opt):
    # function returning a Vector, which can be hopefully freed by del Vector
    cdef int vecsize
    cdef size_t i
    # defs..
    # more stuff...
    vecsize = n
    cdef Vector v = Vector(vecsize)

    for i in range(vecsize):
        # computation
        v[i] = ...

    return v

...
vec = my_vec_func(...
ptr_to_data = vec.data
length_of_vec = vec.n_ax0

The following thread on the Cython mailing list would probably be of interest to you: 您可能会对Cython邮件列表中的以下主题感兴趣:

https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM

It looks like there are some decent options presented if you are fine with returning a memoryview from your function that gets coerced at some different level where perfomance isn't as much of an issue. 看起来有一些不错的选项如果你可以从你的函数中返回一个内存视图,在某个不同的级别强制执行,而性能不是一个问题。

From http://docs.cython.org/src/userguide/memoryviews.html it follows that memory for cython memory views can be allocated via: http://docs.cython.org/src/userguide/memoryviews.html可以看出,cython内存视图的内存可以通过以下方式分配:

cimport cython
cdef type [:] cview = cython.view.array(size = size, 
              itemsize = sizeof(type), format = "type", allocate_buffer = True)

or by 或者

from libc.stdlib import malloc, free
cdef type [:] cview = <type[:size]> malloc(sizeof(type)*size)

Both case works, but in first i have an issues if introduce own type (ctypedef some mytype) because there is no suitable format for it. 两种情况都有效,但首先我有一个问题,如果引入自己的类型(ctypedef some mytype),因为没有合适的格式。 In second case there is problem with deallocation of memory. 在第二种情况下,存储器的重新分配存在问题。

From manual it should work as follows: 从手册,它应该如下工作:

cview.callback_memory_free = free

which bind function which free memory to the memoryview, however this code does not compile. 哪个绑定函数将内存释放到memoryview,但是这段代码不能编译。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM