Cython: Passing multiple numpy arrays in one argument with fused types

Question

I have rewritten an algorithm from C to Cython so I could take advantage of fused types and make it easier to call from python. The algorithm can take multiple arrays to work on along with some other parameters. The arrays are accepted as a pointer to pointers (ex. ). I figured I would call the cython code from python by providing the multiple arrays as a tuple of numpy arrays, but to do it gets kind of messy with fused types. Here's a simple example of how I have it working now:

import numpy
cimport numpy

ctypedef fused test_dtype:
    numpy.float32_t
    numpy.float64_t

cdef int do_stuff(test_dtype **some_arrays):
    if test_dtype is numpy.float32_t:
        return 1
    elif test_dtype is numpy.float64_t:
        return 2
    else:
        return -1

def call_do_stuff(tuple some_arrays):
    cdef unsigned int num_items = len(some_arrays)
    cdef void **the_pointer = <void **>malloc(num_items * sizeof(void *))
    if not the_pointer:
        raise MemoryError("Could not allocate memory")
    cdef unsigned int i
    cdef numpy.ndarray[numpy.float32_t, ndim=2] tmp_arr32
    cdef numpy.ndarray[numpy.float64_t, ndim=2] tmp_arr64
    if some_arrays[0].dtype == numpy.float32:
        for i in range(num_items):
            tmp_arr32 = some_arrays[i]
            the_pointer[i] = &tmp_arr32[0, 0]
        return do_stuff(<numpy.float32_t **>the_pointer)
    elif some_arrays[0].dtype == numpy.float64:
        for i in range(num_items):
            tmp_arr64 = some_arrays[i]
            the_pointer[i] = &tmp_arr64[0, 0]
        return do_stuff(<numpy.float64_t **>cols_pointer)
    else:
        raise ValueError("Array data type is unknown")

I realize that I can specify the type in the tuple, but nothing more complex than "object" if I understand it correctly. Does anyone know of a cleaner way of doing what I'm trying to do? Any other cython tips are appreciated.

There are other arguments passed including a fill_value argument of the same type as the array. The code would get simpler if the test_dtype could be determined at call time via the arrays or the fill argument, but I can't find a good way to guarantee that C will receive the value in the correct type. For example, passing numpy.nan or numpy.float64(numpy.nan) does not guarantee the data type.

Answer 1

After programming Python and NumPy for 10 years (and C, C++, Matlab and Fortran 10 years before that), this is my general impression:

It is often easier to write numerical code in C, C++ or Fortran than Cython. The only exception I can think of is the smallest of code snipplets. In C++ you have the luxury of using templates and the STL (and Boost if you like).

Learn to use the NumPy C API. The PyArrayObject (which is what a NumPy array is called in C) has a type number you can use for dispatch. You obtain it using the macro PyArray_TYPE() on your PyArrayObject*. numpy.float64 maps to type number NPY_FLOAT64, numpy.float32 maps to type number NPY_FLOAT32, etc. Then you have corresponding C and C++ typedefs which you can use in your C or C++ code: If PyArray_TYPE(x) == NPY_FLOAT64, the data type to use in C or C++ is npy_float64. This way you can write C or C++ code which is totally defined by the NumPy arrays you pass in.

I usually use a switch statement on PyArray_TYPE(x), and case with NPY_FLOAT64, NPY_FLOAT32, etc. For each case I call a templated C++ function with the correct template type. This keeps the amount of code I need to write down to a minimum.

http://docs.scipy.org/doc/numpy/reference/c-api.html

Cython is good for wrapping C and C++ and avoiding tedious Python C API coding, but here is a limit to how much you can statically type arguments. For "down-to-the-iron" numerical code I think it is better to use plain C++, but Cython is an excellent tool for exposing it to Python. So write your numerical stuff in C++ and use Cython to call your C++. That would be the best advice I can give Cython is an excellent tool for writing C extensions to Python, but it is not a replacement for C++ when C++ is what you really want.

As for you question: The thing you want to do is not really possible. Because in C or C++, which is what Cython emits, numpy.ndarray is PyArrayObject* regardless of dtype. So you need to handcode the switch statement.

Cython: Passing multiple numpy arrays in one argument with fused types

Question

1 answers

solution1
5 ACCPTED 2015-01-03 15:45:40

Cython: Passing multiple numpy arrays in one argument with fused types

Question

1 answers

solution1 5 ACCPTED 2015-01-03 15:45:40

solution1
5 ACCPTED 2015-01-03 15:45:40