[英]Convolution of 3D numpy arrays
I've got a code from my supervisor to implement MDCT polyphase analysis and synthesis. 我从主管那里得到了执行MDCT多相分析和合成的代码。 Unfortunately, this code includes one very slow function with 2 loops.
不幸的是,此代码包含一个非常慢的函数,带有2个循环。 If somebody can help me to simplify this function and make it faster I will appreciate your help.
如果有人可以帮助我简化此功能并使它变得更快,我将感谢您的帮助。 This is part of the code:
这是代码的一部分:
def polmatmult(A, B):
"""polmatmult(A,B)
multiplies two polynomial matrices (arrays) A and B, where each matrix entry is a polynomial.
Those polynomial entries are in the 3rd dimension
The third dimension can also be interpreted as containing the (2D) coefficient matrices of exponent of z^-1.
Result is C=A*B;"""
print("np.shape(A)", np.shape(A))
print("np.shape(B)", np.shape(B))
[NAx, NAy, NAz] = np.shape(A);
[NBx, NBy, NBz] = np.shape(B);
"Degree +1 of resulting polynomial, with NAz-1 and NBz-1 being the degree of the input polynomials:"
Deg = NAz + NBz - 1;
print("Deg", Deg)
C = np.zeros((NAx, NBy, Deg));
"Convolution of matrices:"
for n in range(0, (Deg)):
for m in range(0, n + 1):
if ((n - m) < NAz and m < NBz):
C[:, :, n] = C[:, :, n] + np.dot(A[:, :, (n - m)], B[:, :, m]);
return C
EDIT : I realize now that poly1d
is far more inefficient than the original solution, mainly due to poly1d
being implemented in Python instead of C. A comparison of their prun
s is not pretty: 编辑 :我现在认识到
poly1d
比原来的解决方案更低效的,这主要是由于poly1d
在Python,而不是C.他们比较正在实施prun
s是不漂亮:
%prun np.dot(mod_A, mod_B)
888804 function calls (872804 primitive calls) in 0.436 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
47200 0.069 0.000 0.168 0.000 polynomial.py:1076(__init__)
78800/62800 0.053 0.000 0.085 0.000 {built-in method numpy.core.multiarray.array}
46800 0.036 0.000 0.063 0.000 shape_base.py:11(atleast_1d)
31600 0.030 0.000 0.036 0.000 function_base.py:2209(trim_zeros)
47200 0.024 0.000 0.024 0.000 {method 'copy' of 'numpy.ndarray' objects}
8000 0.023 0.000 0.023 0.000 {built-in method numpy.core.multiarray.correlate}
47200 0.020 0.000 0.050 0.000 polynomial.py:1041(coeffs)
8000 0.020 0.000 0.304 0.000 polynomial.py:1183(__mul__)
7600 0.019 0.000 0.041 0.000 polynomial.py:683(polyadd)
8000 0.016 0.000 0.123 0.000 numeric.py:978(convolve)
8000 0.014 0.000 0.204 0.000 polynomial.py:790(polymul)
7600 0.014 0.000 0.119 0.000 polynomial.py:1197(__add__)
1 0.013 0.013 0.436 0.436 {built-in method numpy.core.multiarray.dot}
94400 0.012 0.000 0.012 0.000 {built-in method builtins.isinstance}
157200 0.011 0.000 0.011 0.000 {built-in method builtins.len}
16000 0.011 0.000 0.034 0.000 polynomial.py:1103(__array__)
46800 0.010 0.000 0.019 0.000 numeric.py:534(asanyarray)
62800 0.009 0.000 0.009 0.000 polynomial.py:1064(_coeffs)
47200 0.009 0.000 0.009 0.000 polynomial.py:1067(_coeffs)
8000 0.008 0.000 0.009 0.000 numeric.py:2135(isscalar)
8000 0.005 0.000 0.007 0.000 numeric.py:904(_mode_from_name)
46800 0.004 0.000 0.004 0.000 {method 'append' of 'list' objects}
16000 0.004 0.000 0.007 0.000 numeric.py:463(asarray)
31600 0.003 0.000 0.003 0.000 {method 'upper' of 'str' objects}
8000 0.001 0.000 0.001 0.000 {method 'lower' of 'str' objects}
1 0.000 0.000 0.436 0.436 <string>:1(<module>)
1 0.000 0.000 0.436 0.436 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
%prun polymatmult(A, B)
7 function calls in 0.004 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 0.004 0.004 <ipython-input-1053-a9f13893aa45>:1(original_convolution)
1 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.zeros}
1 0.000 0.000 0.004 0.004 {built-in method builtins.exec}
1 0.000 0.000 0.004 0.004 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 fromnumeric.py:1565(shape)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
However, I will stand be it at least being easier . 但是,我至少会更轻松 。
If you used the poly1d
type, this would be so much easier: 如果您使用
poly1d
类型,这将非常容易:
random_poly = np.frompyfunc(lambda i, j: np.poly1d(np.random.randint(1, 4, 3)), 2, 1)
def random_poly_array(shape):
return np.fromfunction(random_poly, shape)
a1 = random_poly_array((3,3))
a2 = random_poly_array((3,3))
mult_a = np.dot(a1, a2)
First of all I'm surprized that there is np.dot there and not np.multiply. 首先,我很惊讶那里没有np.dot而不是np.multiply。 Convolution already happens in for-loops and it should be broadcast to first two dimensions, right?
卷积已经在for循环中发生,应该将其广播到前两个维度,对吗? Anyway, I will further work with np.multiply instead of np.dot and you can change it back accordingly if I'm wrong.
无论如何,我将进一步使用np.multiply而不是np.dot进行操作,如果我错了,则可以相应地将其更改回去。
If this function is a real bottleneck I would use Cython to improve the speed. 如果此功能是真正的瓶颈,我将使用Cython来提高速度。 This is an example of the code:
这是代码示例:
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def myconvolve(np.ndarray[np.float64_t, ndim=3] A,
np.ndarray[np.float64_t, ndim=3] B):
cdef:
int n, m, i, j
int NAx = A.shape[0], NAy = A.shape[1], NAz = A.shape[2]
int NBx = A.shape[0], NBy = A.shape[1], NBz = A.shape[2]
int Deg = NAz + NBz - 1;
np.ndarray[np.float64_t, ndim=3] C = np.zeros((NAx, NBy, Deg));
assert((NAx == NBx) and (NAy == NBy))
for n in range(0, (Deg)):
for m in range(0, n + 1):
if ((n - m) < NAz and m < NBz):
for i in range(0, NAx):
for j in range(0, NAy):
C[i, j, n] = C[i, j, n] + A[i, j, (n - m)] * B[i, j, m]
return C
This has to be compiled, I did it with 这必须进行编译,我做到了
cython myconvolve.pyx -v -2
gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.7 -o myconvolve.so myconvolve.c
Then with the following comparison script 然后用下面的比较脚本
import timeit
import numpy as np
from myconvolve import myconvolve
def original_convolution(A, B):
[NAx, NAy, NAz] = np.shape(A);
[NBx, NBy, NBz] = np.shape(B);
Deg = NAz + NBz - 1;
C = np.zeros((NAx, NBy, Deg));
for n in range(0, (Deg)):
for m in range(0, n + 1):
if ((n - m) < NAz and m < NBz):
C[:, :, n] = C[:, :, n] + np.multiply(A[:, :, (n - m)], B[:, :, m])
return C
print "Checking that implementations produce identical results."
A = np.random.rand(20, 20, 20)
B = np.random.rand(20, 20, 20)
C1 = original_convolution(A, B)
C2 = myconvolve(A, B)
assert(np.abs((C1 - C2).sum()) < 1.e-6)
mysetup = '''
import numpy as np
np.random.seed(0)
from myconvolve import myconvolve
from __main__ import A, B
from __main__ import original_convolution
'''
print 'Numpy implementation time [s]: ', min(timeit.Timer('original_convolution(A, B)', setup=mysetup).repeat(7, 100))
print 'Cython implementation time [s]: ', min(timeit.Timer('myconvolve(A, B)', setup=mysetup).repeat(7, 100))
I get: 我得到:
Numpy implementation time [s]: 0.494730949402
Cython implementation time [s]: 0.0905570983887
A little manipulation to allow easier dot products and ufuc_at
manipulation to remove the for
loops: 进行一些操作以使点乘积更容易,并通过
ufuc_at
操作删除for
循环:
def polmatmult_(A, B):
print("np.shape(A)", np.shape(A))
print("np.shape(B)", np.shape(B))
[NAx, NAy, NAz] = np.shape(A)
[NBx, NBy, NBz] = np.shape(B)
Deg = NAz + NBz - 1
print("Deg", Deg)
C = np.zeros((Deg, NAx, NBy))
m, n = np.triu_indices(NBz, 0, Deg)
m, n = m[n - m < NAz], n[n - m < NAz]
np.add.at(C, n, np.moveaxis(A[:, :, (n - m)], -1, 0) @ np.moveaxis(B[:, :, m], -1, 0))
return np.moveaxis(C, 0, -1)
In general, you want your indexing axis ( z
in ths case) to be the first dimension and not the last. 通常,您希望分度轴(在这种情况下为
z
)是第一个尺寸而不是最后一个尺寸。 This lets you use ufunc
tricks (like add.at
), @
instead of np.dot
, and broadcasting. 这允许您使用
ufunc
技巧(如add.at
), @
代替np.dot
,和广播。 Thus all the np.moveaxis
. 因此所有
np.moveaxis
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.