简体   繁体   中英

PyArray_Check gives Segmentation Fault with Cython/C++

Thank you all in advance.

I am wondering what's the right way to #include all numpy headers and what's the right way to use Cython and C++ to parse numpy arrays. Below is attempt:

// cpp_parser.h 
#ifndef _FUNC_H_
#define _FUNC_H_

#include <Python.h>
#include <numpy/arrayobject.h>

void parse_ndarray(PyObject *);

#endif

I know this might be wrong, I also tried other options but none of them works.

// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>

using namespace std;

void parse_ndarray(PyObject *obj) {
    if (PyArray_Check(obj)) { // this throws seg fault
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

The PyArray_Check routine throws Segmentation Fault. PyArray_CheckExact doesn't throw, but it is not what I wanted exactly.

# parser.pxd
cdef extern from "cpp_parser.h": 
    cdef void parse_ndarray(object)

and the implementation file is:

# parser.pyx
import numpy as np
cimport numpy as np

def py_parse_array(object x):
    assert isinstance(x, np.ndarray)
    parse_ndarray(x)

The setup.py script is

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

import numpy as np

ext = Extension(
    name='parser',
    sources=['parser.pyx', 'cpp_parser.cpp'],
    language='c++',
    include_dirs=[np.get_include()],
    extra_compile_args=['-fPIC'],
)

setup(
    name='parser',
    ext_modules=cythonize([ext])
    )

And finally the test script:

# run_test.py
import numpy as np
from parser import py_parse_array

x = np.arange(10)
py_parse_array(x)

I have created a git repo with all the scripts above: https://github.com/giantwhale/study_cython_numpy/

Quick Fix (read on for more details and a more sophisticated approach):

You need to initialize the variable PyArray_API in every cpp-file in which you are using numpy-stuff by calling import_array() :

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
     import_array(); // PyError if not successful
     return 0;
}

const static int numpy_initialized =  init_numpy();

void parse_ndarraray(PyObject *obj) { // would be called every time
    if (PyArray_Check(obj)) {
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

One could also use _import_array , which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array .

Warning: As pointed out by @isra60, _import_array()/import_array() can only be called, once Python is initialized, ie after Py_Initialize() was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized is initialized before main -starts. In this case, "the initialization trick" should not be used but init_numpy() called after Py_Initialize() .


Sophisticated solution:

The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized.

This can be avoided if PyArray_API isn't defined as static but as extern in all but one translation unit. For those translation units NO_IMPORT_ARRAY macro must be defined before numpy/arrayobject.h is included.

We need however a translation unit in which this symbol is defined. For this translation unit the macro NO_IMPORT_ARRAY must not be defined.

However, without defining the macro PY_ARRAY_UNIQUE_SYMBOL we will get only a static symbol, ie not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API then we would have a multiple definition of a symbol and the linker will fail, ie we cannot use these both libraries together.

Thus, by defining PY_ARRAY_UNIQUE_SYMBOL as MY_FANCY_LIB_PyArray_API prior to every include of numpy/arrayobject.h we would have our own PyArray_API -name, which would not clash with other libraries.

Putting it all together:

A: use_numpy.h - your header for including numpy-functionality ie numpy/arrayobject.h

//use_numpy.h

//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API 

//this macro must be defined for the translation unit              
#ifndef INIT_NUMPY_ARRAY_CPP 
    #define NO_IMPORT_ARRAY //for usual translation units
#endif

//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>

B: init_numpy_api.cpp - a translation unit for initializing of the global MY_PyArray_API :

//init_numpy_api.cpp

//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP

//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"

//now the old trick with initialization:
int init_numpy(){
     import_array();// PyError if not successful
     return 0;
}
const static int numpy_initialized =  init_numpy();

C: just include use_numpy.h whenever you need numpy, it will define extern void **MyPyArray_API :

//example
#include "use_numpy.h"

...
PyArray_Check(obj); // works, no segmentation error

Warning: It should not be forgotten, that for initialization-trick to work, Py_Initialize() must be already called.


Why do you need it (kept for historical reasons):

When I build your extension with debug symbols:

extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],

and run it with gdb:

 gdb --args python run_test.py
 (gdb) run
  --- Segmentation fault
 (gdb) disass

I can see the following:

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
       # 0x7ffff1f2d940 <_ZL11PyArray_API>
   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
   ...
   (gdb) print $rax
   $1 = 16

We should keep in mind, that PyArray_Check is only a define for :

#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)

That seems, that &PyArray_Type uses somehow a part of PyArray_API which is not initialized (has value 0 ).

Let's take a look at the cpp_parser.cpp after the preprocessor (compiled with flag -E :

 static void **PyArray_API= __null
 ...
 static int
_import_array(void)
{
  PyArray_API = (void **)PyCapsule_GetPointer(c_api,...

So PyArray_AP I is static and is initialized via _import_array(void) , that actually would explain the warning I get during the build, that _import_array() was defined but not used - we didn't initialize PyArray_API .

Because PyArray_API is a static variable it must be initialized in every compilation unit ie cpp - file.

So we just need to do it - import_array() seems to be the official way.

Since you use Cython, the numpy APIs have been included in the Cython Includes already. It's straight forward in jupyter notebook.

cimport numpy as np
from numpy cimport PyArray_Check

np.import_array()  # Attention!

def parse_ndarray(object ndarr):
    if PyArray_Check(ndarr):
        print("PyArray_Check Passed")
    else:
        print("PyArray_Check Failed")

I believe np.import_array() is a key here, since you call into the numpy APIs. Comment it and try, a crash also appears.

import numpy as np
from array import array
ndarr = np.arange(3)
pyarr = array('i', range(3))
parse_ndarray(ndarr)
parse_ndarray(pyarr)
parse_ndarray("Trick or treat!")

Output:

PyArray_Check Passed
PyArray_Check Failed
PyArray_Check Failed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM