简体   繁体   English

使用Cython将从C函数创建的2D数组返回到Python

[英]return 2D array created from a C function into Python using Cython

I want to use a 2D array created by a c function in python . 我想使用由pythonc函数创建的2D数组。 I asked how to do this before today and one approach suggested by @Abhijit Pritam was to use structs. 我问今天之前如何做到这一点,@ Abhijit Pritam建议的一种方法是使用结构。 I implemented it and it does work. 我实现了它,并且确实起作用。

c code: C代码:

typedef struct {
  int arr[3][5];
} Array;

Array make_array_struct() {
  Array my_array;
  int count = 0;
  for (int i = 0; i < 3; i++)
    for (int j = 0; j  < 5; j++)
      my_array.arr[i][j] = ++count;
  return my_array;
}

in python I have this: 在Python中我有这个:

cdef extern from "numpy_fun.h":
    ctypedef struct Array:
        int[3][5] arr
    cdef Array make_array_struct()

def make_array():
    cdef Array arr = make_array_struct()
    return arr

my_arr = make_array()
my_arr['arr']
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]

However it was suggested that this was not the best approach to the problem because it's possible to make python have control over the data. 但是有人建议这不是解决该问题的最佳方法,因为有可能使python对数据进行控制。 I'm trying to implement this but I haven't been able to that so far. 我正在尝试实现这一点,但到目前为止我还没有做到。 This is what I have. 这就是我所拥有的。

c code: C代码:

int **make_array_ptr() {
  int **my_array = (int **)malloc(3 * sizeof(int *));
  my_array[0] = calloc(3 * 5, sizeof(int));
  for (int i = 1; i < 3; i++)
    my_array[i] = my_array[0] + i * 5;
  int count = 0;
  for (int i = 0; i < 3; i++)
    for (int j = 0; j < 5; j++)
      my_array[i][j] = ++count;
  return my_array;
}

python: 蟒蛇:

import numpy as np
cimport numpy as np

np.import_array()

ctypedef np.int32_t DTYPE_t

cdef extern from "numpy/arrayobject.h":
    void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)

cdef extern from "numpy_fun.h":
    cdef int **make_array_ptr()

def make_array():
    cdef int[::1] dims = np.array([3, 5], dtype=np.int32)
    cdef DTYPE_t **data = <DTYPE_t **>make_array_ptr()
    cdef np.ndarray[DTYPE_t, ndim=2] my_array = np.PyArray_SimpleNewFromData(2, &dims[0], np.NPY_INT32, data)
    PyArray_ENABLEFLAGS(my_array, np.NPY_OWNDATA)
    return my_array

I was following Force NumPy ndarray to take ownership of its memory in Cython which seems to be what I need to do. 我正在遵循Force NumPy ndarray来获取其在Cython中的内存所有权,这似乎是我需要做的。 In my case is it's different because I need 2D array so I'll likely have to do things a bit differently because for example the function expects data to be a pointer to int and I gave it a pointer to pointer to int. 就我而言,这是不同的,因为我需要2D数组,所以我可能不得不做一些不同的事情,因为例如,该函数希望data是指向int的指针,而我给它提供了指向int的指针。 What do I have to do to use this approach? 使用该方法我必须做什么?

My issues with the struct approach is: 我的struct方法问题是:

  1. It breaks as soon as you want anything but a fixed size of array, with no real way of fixing it. 只要需要固定大小的数组,它就会中断,没有固定的实际方法。

  2. It relies on Cython's implicit conversion from structs to dicts. 它依赖于Cython从结构到字典的隐式转换。 Cython copies the data to a Python list, which isn't terribly efficient. Cython将数据复制到Python列表中,效率不高。 This isn't an issue with the small arrays you have here, but it's silly for larger arrays. 对于这里的小型阵列而言,这不是问题,但是对于大型阵列而言,这是愚蠢的。


I also don't really recommend 2D arrays as pointers-to-pointers. 我也不太建议使用2D数组作为指针到指针。 The way numpy (and most other sensible array libraries) implement 2D arrays is to store a 1D array and the shape of the 2D array, and just use the shape to work out what index to access. numpy(和大多数其他明智的数组库)实现2D数组的方式是存储1D数组和2D数组的形状,并仅使用形状确定要访问的索引。 This tends to be more efficient (faster lookups, faster allocation) and also easier to use (less allocation/deallocation to keep track of). 这往往更有效(更快的查找,更快的分配),也更易于使用(更少的分配/重新分配来跟踪)。

To do this change the C code to: 为此,将C代码更改为:

int32_t *make_array_ptr() {
  int32_t *my_array = calloc(3 * 5, sizeof(int32_t));
  int count = 0;
  for (int i = 0; i < 3; i++)
    for (int j = 0; j < 5; j++)
      my_array[j+i*5] = ++count;
  return my_array;
}

I've deleted the first loop that you immediately overwrite. 我已经删除了您立即覆盖的第一个循环。 I've also changed the type of int32_t since you seem to rely on this in your Cython code later. 我也更改了int32_t的类型,因为您稍后似乎在Cython代码中依赖此类型。

The Cython code is then very close to what you were using: Cython代码非常接近您所使用的代码:

def make_array():
    cdef np.intp_t dims[2] 
    dims[0]=3; dims[1] = 5
    cdef np.int32_t *data = make_array_ptr()
    cdef np.ndarray[np.int32_t, ndim=2] my_array = np.PyArray_SimpleNewFromData(2, &dims[0], np.NPY_INT32, data)
    PyArray_ENABLEFLAGS(my_array, np.NPY_OWNDATA)
    return my_array

The main changes are that I've removed some casts and also just allocated dims as a static array (which seemed simpler than memoryviews) 主要更改是,我删除了一些强制类型转换,还只是将调暗分配为静态数组(这似乎比memoryviews简单)


I don't think it's particularly easy allow numpy to handle a pointer-to-pointer array. 我认为让numpy处理指针到指针数组不是特别容易。 It might be possible by implementing the Python buffer interface but that that seems like a lot of work and may not be easy. 通过实现Python缓冲区接口,这可能是可行的,但这似乎需要大量工作,而且可能并不容易。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM