I want to create a large number of simulated samples from a model using Cython that I need to analyze later using Python. The result of one run of my simulation script should be a 10000 x 10000 array.
I have defined a function using def
and tried to declare my arrays as cdef int my_array[10000][10000]
. The my_script.pyx
file compile correctly but when I run the script I got a "segmentation fault" error (I am on Linux).
Looking for a solution, I have learned that this issue is caused by allocating memory on the stack instead of the heap so I decided to use PyMem_Malloc
to allocate the memory. Here's kind of a minimum version of what I'm trying to do:
import cython
from cpython.mem cimport PyMem_Malloc
from libc.stdlib cimport rand, srand, RAND_MAX
srand(time(NULL))
def my_array_func(int a_param)
cdef int i
cdef int **my_array = <int **>PyMem_Malloc(sizeof(int *) * 10000)
for i in range(10000):
my_array[i] = <int *>PyMem_Malloc(sizeof(int) * 10000)
cdef int j
cdef int k
for j in range(10000):
for k in range(10000):
my_array[j][k] = <float>rand()/RAND_MAX * a_param
return my_array
When I try to compile this file, I got an error Cannot convert 'int **' to Python object
which makes sense because my_array is not properly an array so I guess it cannot be returned as a Python object (sorry, my knowledge of C is really really rusty).
Is there a way to let the function return my 2D array such that it can be used as input to other Python functions? Another more than welcome solution might be to directly save the array in a file that can be imported later by a Python script.
Thanks.
In line with @DavidW 's comment, when matrix computations are involved in Cython it is advisable to use numpy arrays to own the memory and to live in pythonland.
In your case, it would look like this:
import cython
cimport numpy as np
import numpy as np
from libc.stdlib cimport rand, srand, RAND_MAX
from libc.time cimport time
srand(time(NULL))
def my_array_func(int a_param):
cdef int n_rows=10000, ncols=10000
# Mem alloc + Python object owning memory
cdef np.ndarray[dtype=int, ndim=2] my_array = np.empty((n_rows,ncols), dtype=int)
# Memoryview: iterate over my_array at C speed
cdef int[:,::1] my_array_view = my_array
# Fill array
cdef int i, j
for i in range(n_rows):
for j in range(ncols):
my_array_view[i,j] = <int> (rand()/RAND_MAX * a_param)
return my_array
Allocating an empty chunk of memory with defined size, making sure it is owned by a Python object and has all the nice array properties (like .shape
) is what you get in a single line with the cdef np.ndarray[...
. Looping over this array can be done with no Python interaction by using a memoryview.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.