传递/返回Cython Memoryviews与NumPy数组

Question

I am writing Python code to accelerate a region properties function for labeled objects in a binary image. 我正在编写Python代码来加速二进制图像中带标签对象的区域属性功能。 The following code will calculate the number of border pixels of a labeled object in a binary image given the indices of the object. 以下代码将在给定对象索引的情况下计算二进制图像中带标签对象的边界像素数。 The main() function will cycle through all labeled objects in a binary image 'mask' and calculate the number of border pixels for each one. main（）函数将循环遍历二进制图像“遮罩”中的所有标记对象，并计算每个像素的边框像素数。

I am wondering what the best way is to pass or return my variables in this Cython code. 我想知道最好的方法是在此Cython代码中传递或返回变量。 The variables are either in NumPy arrays or typed Memoryviews. 变量位于NumPy数组中或键入的Memoryviews中。 I've messed around with passing/returning the variables in the different formats, but cannot deduce what the best/most efficient way is. 我搞砸了以不同格式传递/返回变量的方法，但是无法推断出最佳/最有效的方法是什么。 I am new to Cython so Memoryviews are still fairly abstract to me and whether there is a different between the two methods remains a mystery. 我是Cython的新手，因此Memoryviews对我来说仍然相当抽象，并且两种方法之间是否存在差异仍然是一个谜。 The images I am working with contain 100,000+ labeled objects so operations such as these need to be fairly efficient. 我正在使用的图像包含100,000+个带有标签的对象，因此诸如此类的操作需要相当高效。

To summarize: 总结一下：

When/should I pass/return my variables as typed Memoryviews rather than NumPy arrays for very repetitive computations? 什么时候/应该将变量作为类型的Memoryview而不是NumPy数组传递/返回，以便进行非常重复的计算？ Is there a way that is best or are they exactly the same? 有没有最好的方法，或者它们是完全一样的？

%%cython --annotate

import numpy as np
import cython
cimport numpy as np

DTYPE = np.intp
ctypedef np.intp_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
def erode(DTYPE_t [:,:] img):

    # Image dimensions
    cdef int height, width, local_min
    height = img.shape[0]
    width = img.shape[1]

    # Padded Array
    padded_np = np.zeros((height+2, width+2), dtype = DTYPE)
    cdef DTYPE_t[:,:] padded = padded_np
    padded[1:height+1,1:width+1] = img

    # Eroded image
    eroded_np = np.zeros((height,width),dtype=DTYPE)
    cdef DTYPE_t[:,:] eroded = eroded_np

    cdef DTYPE_t i,j
    for i in range(height):
        for j in range(width):
            local_min = min(padded[i+1,j+1], padded[i,j+1], padded[i+1,j],padded[i+1,j+2],padded[i+2,j+1])
            eroded[i,j] = local_min
    return eroded_np


@cython.boundscheck(False)
@cython.wraparound(False)
def border_image(slice_np):

    # Memoryview of slice_np
    cdef DTYPE_t [:,:] slice = slice_np

    # Image dimensions
    cdef Py_ssize_t ymax, xmax, y, x
    ymax = slice.shape[0]
    xmax = slice.shape[1]

    # Erode image
    eroded_image_np = erode(slice_np)
    cdef DTYPE_t[:,:] eroded_image = eroded_image_np

    # Border image
    border_image_np = np.zeros((ymax,xmax),dtype=DTYPE)
    cdef DTYPE_t[:,:] border_image = border_image_np
    for y in range(ymax):
        for x in range(xmax):
            border_image[y,x] = slice[y,x]-eroded_image[y,x]
    return border_image_np.sum()


@cython.boundscheck(False)
@cython.wraparound(False)
def main(DTYPE_t[:,:] mask, int numobjects, Py_ssize_t[:,:] indices):

    # Memoryview of boundary pixels
    boundary_pixels_np = np.zeros(numobjects,dtype=DTYPE)
    cdef DTYPE_t[:] boundary_pixels = boundary_pixels_np

    # Loop through each object
    cdef Py_ssize_t y_from, y_to, x_from, x_to, i
    cdef DTYPE_t[:,:] slice
    for i in range(numobjects):
        y_from = indices[i,0]
        y_to = indices[i,1]
        x_from = indices[i,2]
        x_to = indices[i,3]
        slice = mask[y_from:y_to, x_from:x_to]
        boundary_pixels[i] = border_image(slice)

    return boundary_pixels_np

Answer 1

Memoryviews are a more recent addition to Cython, designed to be an improvement compared to the original np.ndarray syntax. Memoryview是Cython的更新版本，旨在与原始np.ndarray语法相比进行改进。 For this reason they're slightly preferred. 因此，它们是首选。 It usually doesn't make too much difference which you use though. 但是，使用它通常不会产生太大差异。 Here are a few notes: 以下是一些注意事项：

Speed 速度

For speed it makes very little difference - my experience is that memoryviews as function parameters are marginally slower, but it's hardly worth worrying about. 对于速度这让很少的差别-我的经验是，作为memoryviews功能参数都或多或少地更慢，但它几乎不值得担心。

Generality 概论

Memoryviews are designed to work with any type that has Python's buffer interface (for example the standard library array module). Memoryview设计为可与具有Python缓冲区接口的任何类型一起使用（例如，标准库array模块）。 Typing as np.ndarray only works with numpy arrays. 键入np.ndarray仅适用于numpy数组。 In principle memorviews can support an even wider range of memory layouts which can make interfacing with C code easier (in practice I've never actually seen this be useful). 原则上，memorviews可以支持更大范围的内存布局，这可以使与C代码的接口更加容易（在实践中，我从未真正看到这很有用）。

As a return value 作为返回值

When returning an array from Cython to code Python the user will probably be happier with a numpy array than with a memoryview. 当从Cython返回数组以编码Python时，用户可能会更喜欢使用numpy数组而不是使用memoryview。 If you're working with memoryviews you can do either: 如果您正在使用memoryviews，则可以执行以下任一操作：

return np.asarray(mview)
return mview.base

Ease of compiling 易于编译

If you're using np.ndarray you have to get the set the include directory with np.get_include() in your setup.py file. 如果您使用的是np.ndarray ，则必须在setup.py文件中使用np.get_include()设置包含目录。 You don't have to do this with memoryviews, which often means you can skip setup.py and just use the cythonize command line command or pyximport for simpler projects. 您不必使用memoryviews来执行此操作，这通常意味着您可以跳过setup.py而仅使用cythonize命令行命令或pyximport进行更简单的项目。

Parallelization 并行化

This is the big advantage of memoryviews compared to numpy arrays (if you want to use it). 与numpy数组相比，这是memoryviews的一大优势（如果要使用它）。 It does not require the global interpreter lock to take slices of a memoryview but it does for a numpy array. 它不需要全局解释器锁来获取memoryview的切片，但它需要一个numpy数组。 This means that the following code outline can work in parallel with a memoryview: 这意味着以下代码大纲可以与memoryview并行工作：

cdef void somefunc(double[:] x) nogil:
     # implementation goes here

cdef double[:,:] 2d_array = np.array(...)
for i in prange(2d_array.shape[0]):
    somefunc(2d_array[i,:])

If you aren't using Cython's parallel functionality this doesn't apply. 如果您不使用Cython的并行功能，则此方法不适用。

`cdef` classes `cdef`类

You can use memoryviews as attributes of cdef classes but not np.ndarray s. 您可以将memoryviews用作cdef类的属性，但不能np.ndarray 。 You can (of course) use numpy arrays as untyped object attributes instead. 您可以（当然）改为使用numpy数组作为未类型化的object属性。

传递/返回Cython Memoryviews与NumPy数组

问题描述

1 个解决方案

解决方案1
6 已采纳 2018-04-12 19:15:49

Speed 速度

Generality 概论

As a return value 作为返回值

Ease of compiling 易于编译

Parallelization 并行化

`cdef` classes `cdef`类

传递/返回Cython Memoryviews与NumPy数组

问题描述

1 个解决方案

解决方案1 6 已采纳 2018-04-12 19:15:49

Speed 速度

Generality 概论

As a return value 作为返回值

Ease of compiling 易于编译

Parallelization 并行化

cdef classes cdef类

解决方案1
6 已采纳 2018-04-12 19:15:49

`cdef` classes `cdef`类