[英]Numpy apply function to every item in array
So let's say I have a 2d array.所以假设我有一个二维数组。 How can I apply a function to every single item in the array and replace that item with the return?
如何将 function 应用于数组中的每个项目并用返回替换该项目? Also, the function's return will be a tuple, so the array will become 3d.
此外,函数的返回将是一个元组,因此数组将变为 3d。
Here is the code in mind.这是记住的代码。
def filter_func(item):
if 0 <= item < 1:
return (1, 0, 1)
elif 1 <= item < 2:
return (2, 1, 1)
elif 2 <= item < 3:
return (5, 1, 4)
else:
return (4, 4, 4)
myarray = np.array([[2.5, 1.3], [0.4, -1.0]])
# Apply the function to an array
print(myarray)
# Should be array([[[5, 1, 4],
# [2, 1, 1]],
# [[1, 0, 1],
# [4, 4, 4]]])
Any ideas how I could do it?任何想法我该怎么做? One way is to do
np.array(list(map(filter_func, myarray.reshape((12,))))).reshape((2, 2, 3))
but that's quite slow, especially when I need to do it on an array of shape (1024, 1024).一种方法是做
np.array(list(map(filter_func, myarray.reshape((12,))))).reshape((2, 2, 3))
但这很慢,尤其是当我需要这样做时在形状数组 (1024, 1024) 上。
I've also seen people use np.vectorize, but it somehow ends up as (array([[5, 2], [1, 4]]), array([[1, 1], [0, 4]]), array([[4, 1], [1, 4]])).
我也看到人们使用 np.vectorize,但它最终以
(array([[5, 2], [1, 4]]), array([[1, 1], [0, 4]]), array([[4, 1], [1, 4]])).
Then it has shape of (3, 2, 2).然后它的形状为 (3, 2, 2)。
you could use this function, with vectorised implementation你可以使用这个 function,带有矢量化实现
def func(arr):
elements = np.array([
[1, 0, 1],
[2, 1, 1],
[5, 1, 4],
[4, 4, 4],
])
arr = arr.astype(int)
mask = (arr != 0) & (arr != 1) & (arr != 2)
arr[mask] = -1
return elements[arr]
you wont be able to rewrite your array because of shape mismatch but you could overwrite the variable myarray
由于形状不匹配,您将无法重写数组,但您可以覆盖变量
myarray
myarray = func(myarray)
myarray
>>> [[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]]
No need to change anything in your function.无需更改 function 中的任何内容。
Just apply the vectorized version of your function to your array and stack the result:只需将 function 的矢量化版本应用于您的数组并堆叠结果:
np.stack(np.vectorize(filter_func)(myarray), axis=2)
The result is:结果是:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
Your list-map:您的列表地图:
In [4]: np.array(list(map(filter_func, myarray.reshape((4,))))).reshape((2, 2, 3))
Out[4]:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
A variation using nested list comprehension:使用嵌套列表理解的变体:
In [5]: np.array([[filter_func(j) for j in row] for row in myarray])
Out[5]:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
Using vectorize
, the result is one array for each element returned by the function.使用
vectorize
,结果是 function 返回的每个元素的一个数组。
In [6]: np.vectorize(filter_func)(myarray)
Out[6]:
(array([[5, 2],
[1, 4]]),
array([[1, 1],
[0, 4]]),
array([[4, 1],
[1, 4]]))
As @Vladi shows these can be combined with stack
(or np.array
followed by a transpose):正如@Vladi 所示,这些可以与
stack
(或np.array
后跟转置)结合使用:
In [7]: np.stack(np.vectorize(filter_func)(myarray),2)
Out[7]:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
Your list-map is fastest.您的列表地图是最快的。 I've never found
vectorize
to be faster:我从来没有发现
vectorize
更快:
In [8]: timeit np.array(list(map(filter_func, myarray.reshape((4,))))).reshape((2, 2, 3))
17.2 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: timeit np.array([[filter_func(j) for j in row] for row in myarray])
20.5 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]: timeit np.stack(np.vectorize(filter_func)(myarray),2)
75.2 µs ± 297 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Taking the np.vectorize(filter_func)
out of the timing loop helps just a bit.将
np.vectorize(filter_func)
排除在时序循环之外会有所帮助。
frompyfunc
is similar to vectorize
, but returns object dtype. frompyfunc
类似于vectorize
,但返回 object dtype。 It usually is faster:它通常更快:
In [29]: timeit np.stack(np.frompyfunc(filter_func, 1,3)(myarray),2).astype(int)
28.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Generally if you have a function that only takes scalar inputs, it's hard to do better than simple iteration.通常,如果您有一个只接受标量输入的 function,则很难比简单的迭代做得更好。
vectorize/frompyfunc
don't improve on that. vectorize/frompyfunc
没有改进。 Optimal use of numpy
requires rewriting the function to work directly with arrays, as @Hammad demonstrates. numpy
的最佳使用需要重写 function 以直接与 arrays 一起使用,如 @Hammad 所示。
Though with this small example, even this proper numpy
solution isn't faster.尽管有了这个小例子,即使是这个合适的
numpy
解决方案也不会更快。 I expect it will scale better:我希望它会更好地扩展:
In [32]: timeit func(myarray)
25 µs ± 60.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.