[英]NumPy broadcasting to improve dot-product performance
This is a rather simple operation, but it is repeated millions of times in my actual code and, if possible, I'd like to improve its performance. 这是一个相当简单的操作,但在我的实际代码中重复了数百万次,如果可能的话,我想提高它的性能。
import numpy as np
# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)
# Operation on 'xx' to obtain the final array 'yy'
yy = xx[0] * a * b + xx[1] * b + xx[2] * c
The last line is the one I'd like to improve. 最后一行是我想改进的。 Basically, each term in
xx
is multiplied by a factor (given by the a, b, c
coefficients) and then all terms are added to give a final yy
array with the shape (14, 1)
vs the shape of the initial xx
array (3, 14, 1)
. 基本上,
xx
每个项乘以一个因子(由a, b, c
系数给出)然后添加所有项以给出具有形状(14, 1)
yy
的最终yy
数组与初始xx
数组的形状(3, 14, 1)
。
Is it possible to do this via numpy broadcasting? 通过numpy广播可以做到这一点吗?
We could use broadcasted multiplication and then sum along the first axis for the first alternative. 我们可以使用广播乘法,然后沿第一轴求和第一个替代。
As the second one, we could also bring in matrix-multiplication with np.dot
. 作为第二个,我们也可以使用
np.dot
引入矩阵乘法。 Thus, giving us two more approaches. 因此,给我们两个方法。 Here's the timings for the sample provided in the question -
以下是问题中提供的样本的时间 -
# Original one
In [81]: %timeit xx[0] * a * b + xx[1] * b + xx[2] * c
100000 loops, best of 3: 5.04 µs per loop
# Proposed alternative #1
In [82]: %timeit (xx *np.array([a*b,b,c])[:,None,None]).sum(0)
100000 loops, best of 3: 4.44 µs per loop
# Proposed alternative #2
In [83]: %timeit np.array([a*b,b,c]).dot(xx[...,0])[:,None]
1000000 loops, best of 3: 1.51 µs per loop
This is similar to Divakar's answer. 这与Divakar的答案类似。 Swap the first and the third axis of
xx
and do dot product. 交换
xx
的第一轴和第三轴并做点积。
import numpy as np
# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)
def op():
yy = xx[0] * a * b + xx[1] * b + xx[2] * c
return yy
def tai():
d = np.array([a*b, b, c])
return np.swapaxes(np.swapaxes(xx, 0, 2).dot(d), 0, 1)
def Divakar():
# improvement given by Divakar
np.array([a*b,b,c]).dot(xx.swapaxes(0,1))
%timeit op()
7.21 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit tai()
4.06 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit Divakar()
3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.