简体   繁体   中英

How to iterate over a list of lists in cython (or numba)?

I want a function that receives as argument a list of lists, each sub-list with different size, and can iterate on each of the sub-lists (that contain integers), to pass them as broadcasting to an array of numpy and perform different operations (like the average).

Let me include a simple example of expected behavior without using cython:

import numpy as np

mask = [[0, 1, 2, 4, 6, 7, 8, 9],
        [0, 1, 2, 4, 6, 7, 8, 9],
        [0, 1, 2, 4, 6, 9],
        [3, 5, 8],
        [0, 1, 2, 4, 6, 7, 8, 9],
        [3, 5, 7],
        [0, 1, 2, 4, 6, 9],
        [0, 1, 4, 5, 7, 8, 9],
        [0, 1, 3, 4, 7, 8, 9],
        [0, 1, 2, 4, 6, 7, 8, 9]] # This is the list of lists

x = np.array([2.0660689 , 2.08599832, 0.45032649, 1.05435649, 2.06010132,
              1.07633407, 0.43014785, 1.54286467, 1.644388  , 2.15417444])

def nocython(mask, x):
    out = np.empty(len(x), dtype=np.float64)
    for i, v in enumerate(mask):
        out[i] = x[v].mean()
    return out

>>> nocython(mask, x)
array([1.55425875, 1.55425875, 1.54113622, 1.25835952, 1.55425875,
       1.22451841, 1.54113622, 1.80427567, 1.80113602, 1.55425875])

The main problem is that I have to handle much larger numpy arrays and mask lists, and the loops become hugely inefficient in Python. So I wanted to know how I could cythonize (or numbaize) this function. Something like this:

%%cython

import numpy as np
cimport numpy as np

cdef np.ndarray[np.float64_t] cythonloop(int[:,:] mask, np.ndarray[np.float64_t] x):
    cdef Py_ssize_t i
    cdef Py_ssize_t N = len(x)
    cdef np.ndarray[np.float64_t] out = np.empty(N, dtype=np.float64)
    for i in range(N):
        out[i] = x[mask[i]]

cythonloop(mask, x)

But this doesn't work (Cannot coerce list to type 'int[:, :]').

Neither if I try it in numba

import numba as nb

@nb.njit
def nocython(mask, x):
    out = np.empty(len(x), dtype=np.float64)
    for i, v in enumerate(mask):
        out[i] = x[v].mean()
    return out

Which gives the following error:

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): (array(float64, 1d, A), reflected list(int64))
 * parameterized

In Numba you can use a Typed List for iteration over a list of lists. Numba doesn't support indexing a NumPy array with a list, so the function also needs some modification to implement the mean by iterating over the elements of the inner list and indexing into x .

You also need to convert the list of lists into a typed list of typed lists prior to calling the jitted function.

Putting this together gives (in addition to the code from your question):

from numba import njit
from numba.typed import List

@njit
def jitted(mask, x): 
    out = np.empty(len(x), dtype=np.float64)
    for i in range(len(mask)):
        m_i = mask[i]
        s = 0 
        for j in range(len(m_i)):
            s += x[m_i[j]]
        out[i] = s / len(m_i)
    return out 

typed_mask = List()
for m in mask:
    typed_mask.append(List(m))

# Sanity check - Numba and nocython implementations produce the same result
np.testing.assert_allclose(nocython(mask, x),  jitted(typed_mask, x))

Note that it is also possible to avoid making the list a Typed List, as Numba will use a Reflected List when a builtin list type is passed - however this feature is deprecated and will be removed from a future version of Numba, so it's recommended to use the Typed List instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM