简体   繁体   中英

Trimming a numpy array in cython

Currently I have the following cython function, modifies entries of a numpy array filled with zeros to sum non-zero values. Before I return the array, I would like to trim it and remove all the non-zero entries. At the moment, I use the numpy function myarray = myarray[~np.all(myarray == 0, axis=1)] to do so. I was wondering if there is (in general) a faster way to do this using a Cython/C function instead of relying on python/numpy. This is one of the last bits of pythonic interactions in my script (checked by using to %%cython -a ). But I don't really know how to proceed with this problem. In general, i don't know a priori the number of nonzero elements in the final array.

cdef func():
   np.ndarray[np.float64_t, ndim=2] myarray = np.zeros((lenpropen, 6)) 
   """
   computations
   """

   myarray = myarray[~np.all(myarray == 0, axis=1)]
   return myarray

If the highest dimension contains always a small number of element like 6, then your code is not the best one.

First of all, myarray == 0 , np.all and ~ creates temporary arrays that introduces some additional overhead as they needs to be written and read back. The overhead is dependent of the this of the temporary array and the biggest one is myarray == 0 .

Moreover, Numpy calls perform some unwanted checks that Cython is not able to remove. These checks introduce a constant time overhead. Thus, is can be quite big for small input arrays but not big input arrays.

Additionally, the code of np.all can be faster if it would know the exact size of the last dimension which is not the case here. Indeed, the loop of np.all could theoretically be unrolled since the last dimension is small. Unfortunately, Cython does not optimize Numpy calls and Numpy is compiled for a variable input size, so not known at compile-time .

Finally, the computation can be parallelized if lenpropen is huge (otherwise this will not be faster and could actually be slower). However, note that a parallel implementation requires the computation to be done in two steps: np.all(myarray == 0, axis=1) needs to be computed in parallel and then you can create the resulting array and write it by computing myarray[~result] in parallel. In sequential, you can directly overwrite myarray by filtering lines in-place and then produce a view of the filtered lines. This pattern is known as the erase-remove idiom . Note that this assume the array is contiguous .

To conclude, a faster implementation consists writing 2 nested loops iterating on myarray with a constant number of iterations for the innermost one. Regarding the size of lenpropen , you can either use a sequential in-place implementation base on the erase-remove idiom, or a parallel out-of-place implementation with two steps (and a temporary array).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM