简体   繁体   English

我可以强迫一个numpy ndarray获取其内存的所有权吗?

[英]Can I force a numpy ndarray to take ownership of its memory?

I have a C function that mallocs() and populates a 2D array of floats. 我有一个C函数mallocs()并填充浮点数的二维数组。 It "returns" that address and the size of the array. 它“返回”该地址和数组的大小。 The signature is 签名是

int get_array_c(float** addr, int* nrows, int* ncols);

I want to call it from Python, so I use ctypes. 我想用Python调用它,所以我使用ctypes。

import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c

I never figured out how to specify argument types with ctypes. 我从未弄清楚如何使用ctypes指定参数类型。 I tend to just write a python wrapper for each C function I'm using, and make sure I get the types right in the wrapper. 我倾向于为我正在使用的每个C函数编写一个python包装器,并确保我在包装器中获得正确的类型。 The array of floats is a matrix in column-major order, and I'd like to get it as a numpy.ndarray. 浮点数组是按主列顺序排列的矩阵,我想把它作为numpy.ndarray。 But its pretty big, so I want to use the memory allocated by the C function, not copy it. 但它非常大,所以我想使用C函数分配的内存,而不是复制它。 (I just found this PyBuffer_FromMemory stuff in this StackOverflow answer: https://stackoverflow.com/a/4355701/3691 ) (我刚刚在StackOverflow中找到了这个PyBuffer_FromMemory的答案: https ://stackoverflow.com/a/4355701/3691)

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object

import numpy
def get_array_py():
    nrows = ctypes.c_int()
    ncols = ctypes.c_int()
    addr_ptr = ctypes.POINTER(ctypes.c_float)()
    get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
    buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
    return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
                         buffer=buf)

This seems to give me an array with the right values. 这似乎给了我一个具有正确值的数组。 But I'm pretty sure it's a memory leak. 但我很确定这是一个内存泄漏。

>>> a = get_array_py()
>>> a.flags.owndata
False

The array doesn't own the memory. 该阵列不拥有内存。 Fair enough; 很公平; by default, when the array is created from a buffer, it shouldn't. 默认情况下,当从缓冲区创建数组时,它不应该。 But in this case it should. 但在这种情况下它应该。 When the numpy array is deleted, I'd really like python to free the buffer memory for me. 删除numpy数组时,我真的很想让python为我释放缓冲区内存。 It seems like if I could force owndata to True, that should do it, but owndata isn't settable. 似乎我可以强制将owndata强制为True,那应该这样做,但是owndata是不可设置的。

Unsatisfactory solutions: 不满意的解决方案:

  1. Make the caller of get_array_py() responsible for freeing the memory. 让get_array_py()的调用者负责释放内存。 That's super annoying; 这太烦人了; the caller should be able to treat this numpy array just like any other numpy array. 调用者应该能够像任何其他numpy数组一样处理这个numpy数组。

  2. Copy the original array into a new numpy array (with its own, separate memory) in get_array_py, delete the first array, and free the memory inside get_array_py(). 将原始数组复制到get_array_py中的新numpy数组(具有自己的独立内存),删除第一个数组,并释放get_array_py()中的内存。 Return the copy instead of the original array. 返回副本而不是原始数组。 This is annoying because it's an ought-to-be unnecessary memory copy. 这很烦人,因为它应该是不必要的内存副本。

Is there a way to do what I want? 有办法做我想要的吗? I can't modify the C function itself, although I could add another C function to the library if that's helpful. 我不能修改C函数本身,虽然我可以在库中添加另一个C函数,如果这有用的话。

I just stumbled upon this question, which is still an issue in August 2013. Numpy is really picky about the OWNDATA flag: There is no way it can be modified on the Python level, so ctypes will most likely not be able to do this. 我只是偶然发现了这个问题,这在2013年8月仍然是一个问题OWNDATAOWNDATA标志非常挑剔:在Python级别上无法修改它,因此ctypes很可能无法做到这一点。 On the numpy C-API level - and now we are talking about a completely different way of making Python extension modules - one has to explicitly set the flag with: 在numpy C-API级别 - 现在我们讨论的是一种完全不同的制作Python扩展模块的方法 - 必须使用以下方法明确设置标志:

PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);

On numpy < 1.7, one had to be even more explicit: 在numpy <1.7,一个必须更加明确:

((PyArrayObject*)arr)->flags |= NPY_OWNDATA;

If one has any control over the underlying C function/library, the best solution is to pass it an empty numpy array of the appropriate size from Python to store the result in. The basic principle is that memory allocation should always be done on the highest level possible, in this case on the level of the Python interpreter. 如果对底层C函数/库有任何控制权,最好的解决方案是从Python传递一个适当大小的空numpy数组来存储结果。基本原则是内存分配应始终在最高位置完成可能的级别,在这种情况下在Python解释器的级别上。


As kynan commented below, if you use Cython , you have to expose the function PyArray_ENABLEFLAGS manually, see this post Force NumPy ndarray to take ownership of its memory in Cython . 正如kynan在下面评论过的,如果你使用Cython ,你必须手动公开PyArray_ENABLEFLAGS函数,看看这篇文章强制NumPy ndarray在Cython中获取其内存的所有权

The relevant documentation is here and here . 相关文档在这里这里

I would tend to have two functions exported from my C library: 我倾向于从我的C库导出两个函数:

int get_array_c_nomalloc(float* addr, int nrows, int ncols); /* Pass addr as argument */
int get_array_c(float **addr, int nrows, int ncols); /* Calls function above */

I would then write my Python wrapper[1] of get_array_c to allocate the array, then call get_array_c_nomalloc. 然后我会编写get_array_c的Python包装器[1]来分配数组,然后调用get_array_c_nomalloc。 Then Python does own the memory. 然后Python 确实拥有内存。 You could integrate this wrapper into your library so your user never has to be aware of get_array_c_nomalloc's existence. 您可以将此包装器集成到库中,这样您的用户就不必了解get_array_c_nomalloc的存在。

[1] This isn't really a wrapper anymore, but instead is an adapter. [1]这不再是一个包装器,而是一个适配器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM