简体   繁体   中英

Function in cython changes numpy array type

I am working with Cython and numpy, and have a strange issue to do with a cython function changing the dtype of the elements of a numpy array. Strangely, the dtype is only changed when the input type of the array is actually specified.

I am using Cython==0.29.11, numpy==1.15.4, python 3.6, on Ubuntu 18.04.

# cyth.pyx
cimport numpy as np

def test(x):
    print(type(x[0]))

def test_np(np.ndarray[np.uint32_t, ndim=1] x):
    print(type(x[0]))

Now cythonising this file and using the functions:

>>> from cyth import test, test_np
>>> import numpy as np
>>> a = np.array([1, 2], dtype=np.uint32)
>>> test(a)
<class 'numpy.uint32'>
>>> test_np(a)
<class 'int'>

So test works as expected, printing the type of the first element in the input array - a uint32. But test_np , which actually ensures that the type of the incoming array is uint32, now shows a regular Python int as the type of the first element.

Even trying to force the element to to be of the right type does not work, ie using:

def test_np(np.ndarray[np.uint32_t, ndim=1] x):
    cdef np.uint32_t el
    el = x[0]
    print(type(el))

still results in

>>> test_np(a)
<class 'int'>

Any help in understanding this discrepancy would be greatly appreciated.

Cython doesn't change the type of the array, but returns an element of a slightly different type.

The data in numpy-array is stored as contiguous field of 32bit unsigned integers. Accessing x[0] means creating a Python-object (because Python interpreter cannot handle raw C-ints) - numpy has a dedicated wrapper class for every numpy-dtype and returns an np.uint32 -object.

Cython on the other hand, maps all C integer types (eg long , int and so on) simple onto Python-integer (which make sense).

Now, when numpy is cimported, x[0] no longer means using __getitem__() of the numpy-array (which would return np.uint32 -object) but a C-integer (in this case unsigned 4byte), which is converted to a Python-integer, because "return XXX" means in a def function means the result must be a Python-object.

Which does mean, that the array has a different type - the types are mapped differently when converted to Python-object by Cython.


If you want to access data as np.uint32 -objects, you could call __getitem__ instead of [..] ( [..] is translated by Cython as access to raw-C-data):

%%cython
cimport numpy as np

def test_np(np.ndarray[np.uint32_t, ndim=1] x):
    print(type(x[0]))                     # int
    print(type(x.__getitem__(0)))         # numpy.uint32

When you use typed memory views rather than ndarray, then calling __getitem__ directly will return a Python-integer __getitem__ of the memory view doesn't call __getitem__ of the underlying ndarray but accesses the data on the C-level. To call __getitem__ of the underlying object for memory view:

def test_np(np.uint32_t[:] x):
    print(type(x[0]))
    print(type(x.base.__getitem__(0))) # instead of x.__getitem__(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM