如何通过cython将numpy数组列表传递给C ++

Question

I want to pass a list of 2d numpy arrays to a c++ function. 我想将2d numpy数组的列表传递给c ++函数。 My first idea is using a std::vector<float *> to receive the list of array, but I can't find a way to pass the list. 我的第一个想法是使用std::vector<float *>接收数组列表，但是我找不到传递列表的方法。

The c++ function looks like this: c ++函数如下所示：

double cpp_func(const std::vector<const float*>& vec) {
    return 0.0;
}

Cython function likes this: Cython函数如下所示：

cpdef py_func(list list_of_array):
    cdef vector[float*] vec
    cdef size_t i
    cdef size_t n = len(list_of_array)
    for i in range(n):
        vec.push_back(&list_of_array[i][0][0])  # error: Cannot take address of Python object
    return cpp_func(vec)

I have tried declare list_of_array using list[float[:,:]] , but won't work either. 我曾尝试使用list[float[:,:]]声明list_of_array ，但也无法正常工作。

Answer 1

I will slightly change the signature of your function: 我将稍微更改您的函数的签名：

for every numpy-array the function also needs to know the number of elements in this array 对于每个numpy数组，函数还需要知道此数组中的元素数
data is double * rather than float * because this is what corresponds to default np.float -type. 数据是double *而不是float *因为这对应于默认的np.float -type。 But this can be adjusted accordingly to your needs. 但这可以根据您的需要进行调整。

That leads to the following c++-interface/code (for convenience I use C-verbatim-code feature for Cython>=0.28): 这导致以下c ++接口/代码（为方便起见，我将Cy - thon> = 0.28使用C-verbatim-code功能）：

%%cython --cplus -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
    """
    struct Numpy1DArray{
        double *ptr;
        int   size;
    };

    static double cpp_func(const std::vector<Numpy1DArray> &vec){
          // Fill with life to see, that it really works:
          double res = 0.0;
          for(const auto &a : vec){
              if(a.size>0)
                res+=a.ptr[0];
          }
          return res;
    }   
    """
    cdef struct Numpy1DArray:
        double *ptr
        int size          
    double cpp_func(const vector[Numpy1DArray] &vec)
    ...

The struct Numpy1DArray just bundles the needed information for a np-array, because this is more than just a pointer to continuous data. struct Numpy1DArray只是捆绑了一个np数组所需的信息，因为这不仅仅是指向连续数据的指针。

Naive version 天真的版本

Now, writing the wrapper function is pretty straight forward: 现在，编写包装函数非常简单：

%%cython --cplus -c=-std=c++11
....
def call_cpp_func(list_of_arrays):
  cdef Numpy1DArray ar_descr
  cdef vector[Numpy1DArray] vec
  cdef double[::1] ar
  for ar in list_of_arrays:  # coerse elements to double[::1]
        ar_descr.size = ar.size
        if ar.size > 0:
            ar_descr.ptr = &ar[0]
        else:
            ar_descr.ptr = NULL  # set to nullptr
        vec.push_back(ar_descr)

  return cpp_func(vec)

There are some things worth noting: 有一些值得注意的事情：

you need to coerce the elements of list to something what implements buffer protocol, otherwise &ar[0] will obviously not work, because Cython would expect ar[0] to be a Python-object. 您需要将list的元素强制转换为实现缓冲区协议的内容，否则&ar[0]显然将不起作用，因为Cython希望ar[0]是Python对象。 Btw, this is what you have missed. 顺便说一句，这就是您所错过的。
I have chosen Cython's memory views (ie double[::1] ) as target for coersion. 我选择了Cython的内存视图（即double[::1] ）作为强制目标。 The advantages over np.ndarray are that it also works with array.array and it is also automatically checked, that the data is continuous (that is the meaning of ::1 ). 与np.ndarray相比，优点在于它还可以与array.array一起array.array ，并且还可以自动检查数据是否连续（即::1的含义）。
a common pitfall is to access ar[0] for an empty ndarray - this access must be guarded. 一个常见的陷阱是访问ar[0]以获得空的ndarray此访问必须受到保护。
this code is not thread-safe. 此代码不是线程安全的。 Another thread could invalidate the the pointers for example by resizing the numpy-arrays in-place or by deleting the numpy-arrays altogether. 另一个线程可能使指针无效，例如，通过就地调整numpy数组的大小或完全删除numpy数组。
IIRC, for Python 2 you will have to cimport array for the code to work with array.array . IIRC，对于Python 2，您将必须cimport array以便使代码与array.array一起array.array 。

Finally, here is a test, that the code works (there is also an array.array in the list to make the point): 最后，这是一个测试代码是否有效的测试（列表中还有一个array.array可以说明这一点）：

import array
import numpy as np
lst = (np.full(3, 1.0), np.full(0, 2.0), array.array('d', [2.0]))
call_cpp_func(lst)  # 3.0 as expected!

Thread-safe version 线程安全版本

The code above can also be written in thread-safe manier. 上面的代码也可以用线程安全的方式编写。 The possible problems are: 可能的问题是：

Another thread could trigger the deletion of numpy-arrays by calling for example list_of_arrays.clear() - after that there could be no more refernces of the arrays around and they would get deleted. 另一个线程可以通过调用例如list_of_arrays.clear()触发numpy-array的删除-之后，周围将不再有数组的引用，它们将被删除。 That means we need to keep a reference to every input-array as long as we use the pointers. 这意味着只要使用指针，就需要保留对每个输入数组的引用。
Another thread could resize the arrays, thus invalidating the pointers. 另一个线程可以调整数组的大小，从而使指针无效。 That means we have to use the buffer protocol - its __getbuffer__ locks the buffer, so it cannot be invalidated and release the buffer via __releasebuffer__ once we are done with calculations. 这意味着我们必须使用缓冲区协议-它的__getbuffer__锁定缓冲区，因此一旦完成计算，就不能使它无效并通过__releasebuffer__释放缓冲区。

Cython's memory views can be used to lock the buffers and to keep a reference of the input-arrays around: Cython的内存视图可用于锁定缓冲区并保持输入数组周围的引用：

%%cython --cplus -c=-std=c++11
....
def call_cpp_func_safe(list_of_arrays):
     cdef Numpy1DArray ar_descr
     cdef vector[Numpy1DArray] vec
     cdef double[::1] ar
     cdef list stay_alive = []
     for ar in list_of_arrays:  # coerse elements to double[::1]
            stay_alive.append(ar)    # keep arrays alive and locked
            ar_descr.size = ar.size
            if ar.size > 0:
                ar_descr.ptr = &ar[0]
            else:
                ar_descr.ptr = NULL  # set to nullptr
            vec.push_back(ar_descr)
     return cpp_func(vec)

There is small overhead: adding memory views to a list - the price of the safety. 开销很小：将内存视图添加到列表中-安全性的代价。

Releasing gil 释放吉尔

One last improvement: The gil can be released when cpp_fun is calculated, that means we have to import cpp_func as nogil and release it why calling the function: 最后一项改进：可以在计算cpp_fun时释放gil，这意味着我们必须将cpp_func导入为nogil并释放它，为什么调用该函数：

%%cython --cplus -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
    ....          
    double cpp_func(const vector[Numpy1DArray] &vec) nogil
...

def call_cpp_func(list_of_arrays):
...
    with nogil:
        result = cpp_func(vec)       
    return result

Cython will figure out, that result is of type double and thus will be able to release the gil while calling cpp_func . Cython会发现， result是double类型的，因此可以在调用cpp_func同时释放gil。

如何通过cython将numpy数组列表传递给C ++

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-09-12 08:39:28

Naive version 天真的版本

Thread-safe version 线程安全版本

Releasing gil 释放吉尔

如何通过cython将numpy数组列表传递给C ++

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-09-12 08:39:28

Naive version 天真的版本

Thread-safe version 线程安全版本

Releasing gil 释放吉尔

解决方案1
2 已采纳 2018-09-12 08:39:28