简体   繁体   中英

How to return or save large malloc'd arrays in Cython as Python objects?

I want to create a large number of simulated samples from a model using Cython that I need to analyze later using Python. The result of one run of my simulation script should be a 10000 x 10000 array.

I have defined a function using def and tried to declare my arrays as cdef int my_array[10000][10000] . The my_script.pyx file compile correctly but when I run the script I got a "segmentation fault" error (I am on Linux).

Looking for a solution, I have learned that this issue is caused by allocating memory on the stack instead of the heap so I decided to use PyMem_Malloc to allocate the memory. Here's kind of a minimum version of what I'm trying to do:

import cython
from cpython.mem cimport PyMem_Malloc
from libc.stdlib cimport rand, srand, RAND_MAX

srand(time(NULL))

def my_array_func(int a_param)
    cdef int i
    cdef int **my_array = <int **>PyMem_Malloc(sizeof(int *) * 10000)
    for i in range(10000):
        my_array[i] = <int *>PyMem_Malloc(sizeof(int) * 10000)
    
    cdef int j
    cdef int k
    for j in range(10000):
        for k in range(10000):
            my_array[j][k] = <float>rand()/RAND_MAX * a_param
    
    return my_array

When I try to compile this file, I got an error Cannot convert 'int **' to Python object which makes sense because my_array is not properly an array so I guess it cannot be returned as a Python object (sorry, my knowledge of C is really really rusty).

Is there a way to let the function return my 2D array such that it can be used as input to other Python functions? Another more than welcome solution might be to directly save the array in a file that can be imported later by a Python script.

Thanks.

In line with @DavidW 's comment, when matrix computations are involved in Cython it is advisable to use numpy arrays to own the memory and to live in pythonland.

In your case, it would look like this:

import cython
cimport numpy as np
import numpy as np
from libc.stdlib cimport rand, srand, RAND_MAX
from libc.time cimport time

srand(time(NULL))

def my_array_func(int a_param):
    cdef int n_rows=10000, ncols=10000
    # Mem alloc + Python object owning memory
    cdef np.ndarray[dtype=int, ndim=2] my_array = np.empty((n_rows,ncols), dtype=int)

    # Memoryview: iterate over my_array at C speed
    cdef int[:,::1] my_array_view = my_array 

    # Fill array
    cdef int i, j
    for i in range(n_rows):
        for j in range(ncols):
            my_array_view[i,j] = <int> (rand()/RAND_MAX * a_param)
    
    return my_array

Allocating an empty chunk of memory with defined size, making sure it is owned by a Python object and has all the nice array properties (like .shape ) is what you get in a single line with the cdef np.ndarray[... . Looping over this array can be done with no Python interaction by using a memoryview.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM