简体   繁体   中英

how to pass list of numpy arrays to c++ via cython

I want to pass a list of 2d numpy arrays to a c++ function. My first idea is using a std::vector<float *> to receive the list of array, but I can't find a way to pass the list.

The c++ function looks like this:

double cpp_func(const std::vector<const float*>& vec) {
    return 0.0;
}

Cython function likes this:

cpdef py_func(list list_of_array):
    cdef vector[float*] vec
    cdef size_t i
    cdef size_t n = len(list_of_array)
    for i in range(n):
        vec.push_back(&list_of_array[i][0][0])  # error: Cannot take address of Python object
    return cpp_func(vec)

I have tried declare list_of_array using list[float[:,:]] , but won't work either.

I will slightly change the signature of your function:

  • for every numpy-array the function also needs to know the number of elements in this array
  • data is double * rather than float * because this is what corresponds to default np.float -type. But this can be adjusted accordingly to your needs.

That leads to the following c++-interface/code (for convenience I use C-verbatim-code feature for Cython>=0.28):

%%cython --cplus -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
    """
    struct Numpy1DArray{
        double *ptr;
        int   size;
    };

    static double cpp_func(const std::vector<Numpy1DArray> &vec){
          // Fill with life to see, that it really works:
          double res = 0.0;
          for(const auto &a : vec){
              if(a.size>0)
                res+=a.ptr[0];
          }
          return res;
    }   
    """
    cdef struct Numpy1DArray:
        double *ptr
        int size          
    double cpp_func(const vector[Numpy1DArray] &vec)
    ...

The struct Numpy1DArray just bundles the needed information for a np-array, because this is more than just a pointer to continuous data.


Naive version

Now, writing the wrapper function is pretty straight forward:

%%cython --cplus -c=-std=c++11
....
def call_cpp_func(list_of_arrays):
  cdef Numpy1DArray ar_descr
  cdef vector[Numpy1DArray] vec
  cdef double[::1] ar
  for ar in list_of_arrays:  # coerse elements to double[::1]
        ar_descr.size = ar.size
        if ar.size > 0:
            ar_descr.ptr = &ar[0]
        else:
            ar_descr.ptr = NULL  # set to nullptr
        vec.push_back(ar_descr)

  return cpp_func(vec)

There are some things worth noting:

  • you need to coerce the elements of list to something what implements buffer protocol, otherwise &ar[0] will obviously not work, because Cython would expect ar[0] to be a Python-object. Btw, this is what you have missed.
  • I have chosen Cython's memory views (ie double[::1] ) as target for coersion. The advantages over np.ndarray are that it also works with array.array and it is also automatically checked, that the data is continuous (that is the meaning of ::1 ).
  • a common pitfall is to access ar[0] for an empty ndarray - this access must be guarded.
  • this code is not thread-safe. Another thread could invalidate the the pointers for example by resizing the numpy-arrays in-place or by deleting the numpy-arrays altogether.
  • IIRC, for Python 2 you will have to cimport array for the code to work with array.array .

Finally, here is a test, that the code works (there is also an array.array in the list to make the point):

import array
import numpy as np
lst = (np.full(3, 1.0), np.full(0, 2.0), array.array('d', [2.0]))
call_cpp_func(lst)  # 3.0 as expected!

Thread-safe version

The code above can also be written in thread-safe manier. The possible problems are:

  1. Another thread could trigger the deletion of numpy-arrays by calling for example list_of_arrays.clear() - after that there could be no more refernces of the arrays around and they would get deleted. That means we need to keep a reference to every input-array as long as we use the pointers.
  2. Another thread could resize the arrays, thus invalidating the pointers. That means we have to use the buffer protocol - its __getbuffer__ locks the buffer, so it cannot be invalidated and release the buffer via __releasebuffer__ once we are done with calculations.

Cython's memory views can be used to lock the buffers and to keep a reference of the input-arrays around:

%%cython --cplus -c=-std=c++11
....
def call_cpp_func_safe(list_of_arrays):
     cdef Numpy1DArray ar_descr
     cdef vector[Numpy1DArray] vec
     cdef double[::1] ar
     cdef list stay_alive = []
     for ar in list_of_arrays:  # coerse elements to double[::1]
            stay_alive.append(ar)    # keep arrays alive and locked
            ar_descr.size = ar.size
            if ar.size > 0:
                ar_descr.ptr = &ar[0]
            else:
                ar_descr.ptr = NULL  # set to nullptr
            vec.push_back(ar_descr)
     return cpp_func(vec)

There is small overhead: adding memory views to a list - the price of the safety.


Releasing gil

One last improvement: The gil can be released when cpp_fun is calculated, that means we have to import cpp_func as nogil and release it why calling the function:

%%cython --cplus -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
    ....          
    double cpp_func(const vector[Numpy1DArray] &vec) nogil
...

def call_cpp_func(list_of_arrays):
...
    with nogil:
        result = cpp_func(vec)       
    return result

Cython will figure out, that result is of type double and thus will be able to release the gil while calling cpp_func .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM