I have a piece of code, but I want to pull up the performance. My code is:
lis = []
for i in range(6):
for j in range(6):
for k in range(6):
for l in range(6):
lis[i][j] += matrix1[k][l] * (2 * matrix2[i][j][k][l] - matrix2[i][k][j][l])
print(lis)
matrix2 is a 4-dimensional np-array, and matrix1 is a 2d-array.
I want to speed up this code by using np.tensordot(matrix1, matrix2), but then I'm lost.
You can just use a jit-compiler
Your solution isn't bad at all. The only thing I have changed is the indexing and variable loop ranges. If you have numpy arrays and excessive looping you can use a compiler ( Numba ), which is a really simple thing to do.
import numba as nb
import numpy as np
#The function is compiled only at the first call (with using same datatypes)
@nb.njit(cache=True) #set cache to false if copying the function to a command window
def almost_your_solution(matrix1,matrix2):
lis = np.zeros(matrix1.shape,np.float64)
for i in range(matrix2.shape[0]):
for j in range(matrix2.shape[1]):
for k in range(matrix2.shape[2]):
for l in range(matrix2.shape[3]):
lis[i,j] += matrix1[k,l] * (2 * matrix2[i,j,k,l] - matrix2[i,k,j,l])
return lis
Regarding code simplicity I would prefer the einsum solution from hpaulj over the solution shown above. The tensordot solution isn't that easy to understand to my opinion. But that's aa matter of taste.
Comparing performance
The function from hpaulj i used for comparison:
def hpaulj_1(matrix1,matrix2):
matrix3 = 2*matrix2-matrix2.transpose(0,2,1,3)
return np.einsum('kl,ijkl->ij', matrix1, matrix3)
def hpaulj_2(matrix1,matrix2):
matrix3 = 2*matrix2-matrix2.transpose(0,2,1,3)
(matrix1*matrix3).sum(axis=(2,3))
return np.tensordot(matrix1, matrix3, [[0,1],[2,3]])
Very short arrays gives:
matrix1=np.random.rand(6,6)
matrix2=np.random.rand(6,6,6,6)
Original solution: 2.6 ms
Compiled solution: 2.1 µs
Einsum solution: 8.3 µs
Tensordot solution: 36.7 µs
Larger arrays gives:
matrix1=np.random.rand(60,60)
matrix2=np.random.rand(60,60,60,60)
Original solution: 13,3 s
Compiled solution: 18.2 ms
Einsum solution: 115 ms
Tensordot solution: 180 ms
Conclusion
Compilation speeds up the computation by about 3 orders of magnitude and outperforms all other solutions by quite a margin.
Test setup:
In [274]: lis = np.zeros((6,6),int)
In [275]: matrix1 = np.arange(36).reshape(6,6)
In [276]: matrix2 = np.arange(36*36).reshape(6,6,6,6)
In [277]: for i in range(6):
...: for j in range(6):
...: for k in range(6):
...: for l in range(6):
...: lis[i,j] += matrix1[k,l] * (2 * matrix2[i,j,k,l] - mat
...: rix2[i,k,j,l])
...:
In [278]: lis
Out[278]:
array([[-51240, -9660, 31920, 73500, 115080, 156660],
[ 84840, 126420, 168000, 209580, 251160, 292740],
[220920, 262500, 304080, 345660, 387240, 428820],
[357000, 398580, 440160, 481740, 523320, 564900],
[493080, 534660, 576240, 617820, 659400, 700980],
[629160, 670740, 712320, 753900, 795480, 837060]])
right?
I'm not sure that tensordot is the right tool; at least may not be the simplest. It certainly can't handle the matrix2
difference.
Let's start with an obvious substitution:
In [279]: matrix3 = 2*matrix2-matrix2.transpose(0,2,1,3)
In [280]: lis = np.zeros((6,6),int)
In [281]: for i in range(6):
...: for j in range(6):
...: for k in range(6):
...: for l in range(6):
...: lis[i,j] += matrix1[k,l] * matrix3[i,j,k,l]
tests ok - same lis
.
Now it is easy to express this with einsum
- just replicate the indices
In [284]: np.einsum('kl,ijkl->ij', matrix1, matrix3)
Out[284]:
array([[-51240, -9660, 31920, 73500, 115080, 156660],
[ 84840, 126420, 168000, 209580, 251160, 292740],
[220920, 262500, 304080, 345660, 387240, 428820],
[357000, 398580, 440160, 481740, 523320, 564900],
[493080, 534660, 576240, 617820, 659400, 700980],
[629160, 670740, 712320, 753900, 795480, 837060]])
elementwise product plus summation on two axes also works; and an equivalent tensordot
(specifying which axes to sum over)
(matrix1*matrix3).sum(axis=(2,3))
np.tensordot(matrix1, matrix3, [[0,1],[2,3]])
The newer np.matmul/@
can also be used, but requires some reshaping
In [111]: (matrix1.ravel()[None,None,None,:]@matrix3.reshape(6,6,-1,1)).squeeze(
...: )
Out[111]:
array([[-51240, -9660, 31920, 73500, 115080, 156660],
[ 84840, 126420, 168000, 209580, 251160, 292740],
[220920, 262500, 304080, 345660, 387240, 428820],
[357000, 398580, 440160, 481740, 523320, 564900],
[493080, 534660, 576240, 617820, 659400, 700980],
[629160, 670740, 712320, 753900, 795480, 837060]])
This reduces the kl
dimensions down to one, and does 'broadcasting' on the ij
dimensions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.