[英]How to iterate over a list of lists in cython (or numba)?
I want a function that receives as argument a list of lists, each sub-list with different size, and can iterate on each of the sub-lists (that contain integers), to pass them as broadcasting to an array of numpy and perform different operations (like the average).我想要一个 function 作为参数接收列表列表,每个子列表具有不同的大小,并且可以迭代每个子列表(包含整数),将它们作为广播传递给 numpy 数组并执行不同的操作(如平均值)。
Let me include a simple example of expected behavior without using cython:让我包括一个不使用 cython 的预期行为的简单示例:
import numpy as np
mask = [[0, 1, 2, 4, 6, 7, 8, 9],
[0, 1, 2, 4, 6, 7, 8, 9],
[0, 1, 2, 4, 6, 9],
[3, 5, 8],
[0, 1, 2, 4, 6, 7, 8, 9],
[3, 5, 7],
[0, 1, 2, 4, 6, 9],
[0, 1, 4, 5, 7, 8, 9],
[0, 1, 3, 4, 7, 8, 9],
[0, 1, 2, 4, 6, 7, 8, 9]] # This is the list of lists
x = np.array([2.0660689 , 2.08599832, 0.45032649, 1.05435649, 2.06010132,
1.07633407, 0.43014785, 1.54286467, 1.644388 , 2.15417444])
def nocython(mask, x):
out = np.empty(len(x), dtype=np.float64)
for i, v in enumerate(mask):
out[i] = x[v].mean()
return out
>>> nocython(mask, x)
array([1.55425875, 1.55425875, 1.54113622, 1.25835952, 1.55425875,
1.22451841, 1.54113622, 1.80427567, 1.80113602, 1.55425875])
The main problem is that I have to handle much larger numpy arrays and mask lists, and the loops become hugely inefficient in Python.主要问题是我必须处理更大的 numpy arrays 和掩码列表,并且循环在 Python 中变得非常低效。 So I wanted to know how I could cythonize (or numbaize) this function.所以我想知道如何对这个 function 进行 cythonize(或麻木)。 Something like this:像这样的东西:
%%cython
import numpy as np
cimport numpy as np
cdef np.ndarray[np.float64_t] cythonloop(int[:,:] mask, np.ndarray[np.float64_t] x):
cdef Py_ssize_t i
cdef Py_ssize_t N = len(x)
cdef np.ndarray[np.float64_t] out = np.empty(N, dtype=np.float64)
for i in range(N):
out[i] = x[mask[i]]
cythonloop(mask, x)
But this doesn't work (Cannot coerce list to type 'int[:, :]').但这不起作用(不能强制列表输入'int [:, :]')。
Neither if I try it in numba如果我在 numba 中尝试也不会
import numba as nb
@nb.njit
def nocython(mask, x):
out = np.empty(len(x), dtype=np.float64)
for i, v in enumerate(mask):
out[i] = x[v].mean()
return out
Which gives the following error:这给出了以下错误:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): (array(float64, 1d, A), reflected list(int64))
* parameterized
In Numba you can use a Typed List for iteration over a list of lists.在 Numba 中,您可以使用Typed List对列表列表进行迭代。 Numba doesn't support indexing a NumPy array with a list, so the function also needs some modification to implement the mean by iterating over the elements of the inner list and indexing into x
. Numba 不支持使用列表索引 NumPy 数组,因此 function 还需要进行一些修改以通过迭代内部列表的元素并索引到x
来实现平均值。
You also need to convert the list of lists into a typed list of typed lists prior to calling the jitted function.在调用 jitted function 之前,您还需要将列表列表转换为类型列表的类型列表。
Putting this together gives (in addition to the code from your question):把它放在一起给出(除了你的问题的代码):
from numba import njit
from numba.typed import List
@njit
def jitted(mask, x):
out = np.empty(len(x), dtype=np.float64)
for i in range(len(mask)):
m_i = mask[i]
s = 0
for j in range(len(m_i)):
s += x[m_i[j]]
out[i] = s / len(m_i)
return out
typed_mask = List()
for m in mask:
typed_mask.append(List(m))
# Sanity check - Numba and nocython implementations produce the same result
np.testing.assert_allclose(nocython(mask, x), jitted(typed_mask, x))
Note that it is also possible to avoid making the list a Typed List, as Numba will use a Reflected List when a builtin list type is passed - however this feature is deprecated and will be removed from a future version of Numba, so it's recommended to use the Typed List instead.请注意,也可以避免将列表设置为类型化列表,因为 Numba 在传递内置列表类型时将使用反射列表- 但是此功能已弃用并将从 Numba 的未来版本中删除,因此建议改用 Typed List。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.