Cython 与 C++ 接口：大数组的分段错误

Question

I am transferring my code from Python/C interfaced using ctypes to Python/C++ interfaced using Cython.我正在将我的代码从使用 ctypes 连接的 Python/C 转移到使用 Cython 连接的 Python/C++。 The new interface will give me an easier to maintain code, because I can exploit all the C++ features and need relatively few lines of interface-code.新界面将使我更易于维护代码，因为我可以利用所有 C++ 功能并且只需要相对较少的界面代码行。

The interfaced code works perfectly with small arrays.接口代码与小数组完美配合。 However it encounters a segmentation fault when using large arrays.但是，在使用大型数组时会遇到分段错误。 I have been wrapping my head around this problem, but have not gotten any closer to a solution.我一直在思考这个问题，但还没有接近解决方案。 I have included a minimal example in which the segmentation fault occurs.我已经包含了一个发生分段错误的最小示例。 Please note that it consistently occurs on Linux and Mac, and also valgrind did not give insights.请注意，它始终发生在 Linux 和 Mac 上，并且 valgrind 也没有给出见解。 Also note that the exact same example in pure C++ does work without problems.另请注意，纯 C++ 中完全相同的示例确实可以正常工作。

The example contains (part of) a Sparse matrix class in C++.该示例包含（部分）C++ 中的稀疏矩阵类。 An interface is created in Cython.在 Cython 中创建了一个接口。 As a result the class can be used from Python.因此，该类可以从 Python 中使用。

C++ side C++端

sparse.h

#ifndef SPARSE_H
#define SPARSE_H

#include <iostream>
#include <cstdio>

using namespace std;

class Sparse {

  public:
    int* data;
    int  nnz;

    Sparse();
    ~Sparse();
    Sparse(int* data, int nnz);
    void view(void);

};

#endif

sparse.cpp

#include "sparse.h"

Sparse::Sparse()
{
  data = NULL;
  nnz  = 0   ;
}

Sparse::~Sparse() {}

Sparse::Sparse(int* Data, int NNZ)
{
  nnz  = NNZ ;
  data = Data;
}

void Sparse::view(void)
{

  int i;

  for ( i=0 ; i<nnz ; i++ )
    printf("(%3d) %d\n",i,data[i]);

}

Cython interface Cython 接口

csparse.pyx

import  numpy as np
cimport numpy as np

# UNCOMMENT TO FIX
#from cpython cimport Py_INCREF

cdef extern from "sparse.h":
  cdef cppclass Sparse:
    Sparse(int*, int) except +
    int* data
    int  nnz
    void view()


cdef class PySparse:

  cdef Sparse *ptr

  def __cinit__(self,**kwargs):

    cdef np.ndarray[np.int32_t, ndim=1, mode="c"] data

    data = kwargs['data'].astype(np.int32)

    # UNCOMMENT TO FIX
    #Py_INCREF(data)

    self.ptr = new Sparse(
      <int*> data.data if data is not None else NULL,
      data.shape[0],
    )

  def __dealloc__(self):
    del self.ptr

  def view(self):
    self.ptr.view()

setup.py

from distutils.core import setup, Extension
from Cython.Build   import cythonize

setup(ext_modules = cythonize(Extension(
  "csparse",
  sources=["csparse.pyx", "sparse.cpp"],
  language="c++",
)))

Python side Python端

import numpy as np
import csparse

data = np.arange(100000,dtype='int32')

matrix = csparse.PySparse(
  data = data
)

matrix.view() # --> segmentation fault

To run:跑步：

$ python setup.py build_ext --inplace
$ python example.py

Note that data = np.arange(100,dtype='int32') does work .请注意， data = np.arange(100,dtype='int32')确实有效。

Answer 1

The memory is being managed by your numpy arrays.内存由您的 numpy 数组管理。 As soon as they go out of scope (most likely at the end of the PySparse constructor) the arrays cease to exist, and all your pointers are invalid.一旦它们超出范围（很可能在PySparse构造函数的末尾），数组就不再存在，并且您的所有指针都无效。 This applies to both large and small arrays, but presumably you just get lucky with small arrays.这适用于大数组和小数组，但大概你只是对小数组很幸运。

You need to hold a reference to all the numpy arrays you use for the lifetime of your PySparse object:您需要持有对您在PySparse对象的生命周期中使用的所有 numpy 数组的引用：

cdef class PySparse:

  # ----------------------------------------------------------------------------

  cdef Sparse *ptr
  cdef object _held_reference # added

  # ----------------------------------------------------------------------------

  def __cinit__(self,**kwargs):
      # ....
      # your constructor code code goes here, unchanged...
      # ....

      self._held_reference = [data] # add any other numpy arrays you use to this list

As a rule you need to be thinking quite hard about who owns what whenever you're dealing with C/C++ pointers, which is a big change from the normal Python approach.通常，在处理 C/C++ 指针时，您需要非常认真地考虑谁拥有什么，这与普通 Python 方法相比是一个很大的变化。 Getting a pointer from a numpy array does not copy the data and it does not give numpy any indication that you're still using the data.从 numpy 数组获取指针不会复制数据，也不会向 numpy 提供您仍在使用数据的任何指示。

Edit note: In my original version I tried to use locals() as a quick way of gathering a collection of all the arrays I wanted to keep.编辑说明：在我的原始版本中，我尝试使用locals()作为收集我想要保留的所有数组的集合的快速方法。 Unfortunately, that doesn't seem to include to cdef ed arrays so it didn't manage to keep the ones you were actually using (note here that astype() makes a copy unless you tell it otherwise, so you need to hold the reference to the copy, rather than the original passed in as an argument).不幸的是，这似乎没有包含在cdef ed 数组中，因此它无法保留您实际使用的数组（请注意，这里astype()会制作一个副本，除非您另有说明，因此您需要保留引用到副本，而不是作为参数传入的原始文件）。

Cython 与 C++ 接口：大数组的分段错误

问题描述

C++ side C++端

Cython interface Cython 接口

Python side Python端

1 个解决方案

解决方案1
2 已采纳 2016-04-23 18:39:57

Cython 与 C++ 接口：大数组的分段错误

问题描述

C++ side C++端

Cython interface Cython 接口

Python side Python端

1 个解决方案

解决方案1 2 已采纳 2016-04-23 18:39:57

解决方案1
2 已采纳 2016-04-23 18:39:57