简体   繁体   English

NumPy广播以提高网络产品性能

[英]NumPy broadcasting to improve dot-product performance

This is a rather simple operation, but it is repeated millions of times in my actual code and, if possible, I'd like to improve its performance. 这是一个相当简单的操作,但在我的实际代码中重复了数百万次,如果可能的话,我想提高它的性能。

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

# Operation on 'xx' to obtain the final array 'yy'
yy = xx[0] * a * b + xx[1] * b + xx[2] * c

The last line is the one I'd like to improve. 最后一行是我想改进的。 Basically, each term in xx is multiplied by a factor (given by the a, b, c coefficients) and then all terms are added to give a final yy array with the shape (14, 1) vs the shape of the initial xx array (3, 14, 1) . 基本上, xx每个项乘以一个因子(由a, b, c系数给出)然后添加所有项以给出具有形状(14, 1) yy的最终yy数组与初始xx数组的形状(3, 14, 1)

Is it possible to do this via numpy broadcasting? 通过numpy广播可以做到这一点吗?

We could use broadcasted multiplication and then sum along the first axis for the first alternative. 我们可以使用广播乘法,然后沿第一轴求和第一个替代。

As the second one, we could also bring in matrix-multiplication with np.dot . 作为第二个,我们也可以使用np.dot引入矩阵乘法。 Thus, giving us two more approaches. 因此,给我们两个方法。 Here's the timings for the sample provided in the question - 以下是问题中提供的样本的时间 -

# Original one
In [81]: %timeit xx[0] * a * b + xx[1] * b + xx[2] * c
100000 loops, best of 3: 5.04 µs per loop

# Proposed alternative #1
In [82]: %timeit (xx *np.array([a*b,b,c])[:,None,None]).sum(0)
100000 loops, best of 3: 4.44 µs per loop

# Proposed alternative #2
In [83]: %timeit np.array([a*b,b,c]).dot(xx[...,0])[:,None]
1000000 loops, best of 3: 1.51 µs per loop

This is similar to Divakar's answer. 这与Divakar的答案类似。 Swap the first and the third axis of xx and do dot product. 交换xx的第一轴和第三轴并做点积。

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

def op():
    yy = xx[0] * a * b + xx[1] * b + xx[2] * c
    return yy

def tai():
    d = np.array([a*b, b, c])
    return np.swapaxes(np.swapaxes(xx, 0, 2).dot(d), 0, 1)

def Divakar():
    # improvement given by Divakar
    np.array([a*b,b,c]).dot(xx.swapaxes(0,1))

%timeit op()
7.21 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit tai()
4.06 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit Divakar()
3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM