[英]Numpy: Vectorization in the context of a 1:many operation
Suppose I have the following functions defined only to create the function topology the question is about:假设我定义了以下函数,仅用于创建问题所涉及的函数拓扑:
def foo(x,y):
return np.asarray([x for i in range(y)])
bar = lambda x: foo(x,10)
barv = np.vectorize(bar)
z = np.asarray([1, 2, 3])
And the following routine:以及以下例程:
for i in range(z.shape[0]):
rng = np.arange(z[i],100)
# res = barv(rng)
res = np.asarray(list(map(bar,rng)))
The above routine works.上述例程有效。 However, if I uncomment and run the vectorized version, ie:
但是,如果我取消注释并运行矢量化版本,即:
for i in range(z.shape[0]):
rng = np.arange(z[i],100)
res = barv(rng)
The code fails with the follwing error:代码失败并出现以下错误:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-14-660195661e55>", line 3, in <module>
res = barv(rng)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2091, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2170, in _vectorize_call
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.
The error makes sense.错误是有道理的。 However, there must be some way to do a vectorized 1:many operation in numpy?
但是,一定有某种方法可以在 numpy 中进行矢量化 1:many 操作吗?
vectorize
is meant to be used with scalar
functions, ones that take scalar inputs, and return scalar output. vectorize
旨在与scalar
函数一起使用,这些函数接受标量输入并返回标量输出。
In [729]: foo(z,10)
Out[729]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
In [730]: bar(z)
Out[730]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Your bar
is returns a 2d array if given a 1d input.如果给定 1d 输入,您的
bar
将返回一个 2d 数组。 Or a 1d array if given a scalar input如果给定标量输入,则为一维数组
In [734]: bar(4)
Out[734]: array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
We can tell vectorize
to expect an object
return,我们可以告诉
vectorize
期待一个object
返回,
In [735]: barv = np.vectorize(bar, otypes=[object])
In [736]: barv(4)
Out[736]: array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4], dtype=object)
In [737]: barv(z)
Out[737]:
array([array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])], dtype=object)
which can be turned into a 2d array with:它可以变成一个二维数组:
In [738]: np.stack(_)
Out[738]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
vectorize
also has a signature parameter, that might help in this case. vectorize
还有一个签名参数,在这种情况下可能会有所帮助。 But in my experience it's even slower.但根据我的经验,它甚至更慢。
But we don't need vectorize
here - a simple list comprehension is just as good, probably better:但是我们在这里不需要
vectorize
——一个简单的列表理解同样好,可能更好:
In [739]: np.stack([bar(i) for i in z])
Out[739]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
vectorize
makes 'broadcasting' with several input arrays easier, but it does not improve speed. vectorize
使使用多个输入数组的“广播”更容易,但并没有提高速度。 See if you can figure out what vectorize
is doing here with foo
:看看你是否能弄清楚
vectorize
在这里用foo
做什么:
In [743]: f = np.vectorize(foo, otypes=[object])
In [744]: f(np.array([1,2,3]), np.array([2,3,4]))
Out[744]: array([array([1, 1]), array([2, 2, 2]), array([3, 3, 3, 3])], dtype=object)
In [745]: f(np.array([1,2,3]), np.array([[2],[3]]))
Out[745]:
array([[array([1, 1]), array([2, 2]), array([3, 3])],
[array([1, 1, 1]), array([2, 2, 2]), array([3, 3, 3])]],
dtype=object)
Correct numpy
vectorization:正确的
numpy
向量化:
In [762]: np.repeat(z[:,None],10,1)
Out[762]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
Some time comparisons:部分时间对比:
In [766]: timeit np.stack(barv(z))
60 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [767]: timeit np.stack([bar(i) for i in z])
39.6 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [768]: timeit np.repeat(z[:,None],10,1)
4.12 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Your foo
already works with an array input.您的
foo
已经使用数组输入。 There was no need for a np.vectorize
wrapper.不需要
np.vectorize
包装器。
In [783]: foo(z,10).T
Out[783]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
In [784]: timeit foo(z,10).T
10.8 µs ± 364 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.