Use numpy.tensordot to replace a nested loop

Question

I have a piece of code, but I want to pull up the performance. My code is:

lis = []
for i in range(6):
    for j in range(6):
        for k in range(6):
            for l in range(6):
                lis[i][j] += matrix1[k][l] * (2 * matrix2[i][j][k][l] - matrix2[i][k][j][l])  
print(lis)

matrix2 is a 4-dimensional np-array, and matrix1 is a 2d-array.

I want to speed up this code by using np.tensordot(matrix1, matrix2), but then I'm lost.

Answer 1

You can just use a jit-compiler

Your solution isn't bad at all. The only thing I have changed is the indexing and variable loop ranges. If you have numpy arrays and excessive looping you can use a compiler ( Numba ), which is a really simple thing to do.

import numba as nb
import numpy as np
#The function is compiled only at the first call (with using same datatypes)
@nb.njit(cache=True) #set cache to false if copying the function to a command window
def almost_your_solution(matrix1,matrix2):
  lis = np.zeros(matrix1.shape,np.float64)
  for i in range(matrix2.shape[0]):
      for j in range(matrix2.shape[1]):
          for k in range(matrix2.shape[2]):
              for l in range(matrix2.shape[3]):
                  lis[i,j] += matrix1[k,l] * (2 * matrix2[i,j,k,l] - matrix2[i,k,j,l])

  return lis

Regarding code simplicity I would prefer the einsum solution from hpaulj over the solution shown above. The tensordot solution isn't that easy to understand to my opinion. But that's aa matter of taste.

Comparing performance

The function from hpaulj i used for comparison:

def hpaulj_1(matrix1,matrix2):
  matrix3 = 2*matrix2-matrix2.transpose(0,2,1,3)
  return np.einsum('kl,ijkl->ij', matrix1, matrix3)

def hpaulj_2(matrix1,matrix2):
  matrix3 = 2*matrix2-matrix2.transpose(0,2,1,3)
  (matrix1*matrix3).sum(axis=(2,3))
  return np.tensordot(matrix1, matrix3, [[0,1],[2,3]])

Very short arrays gives:

matrix1=np.random.rand(6,6)
matrix2=np.random.rand(6,6,6,6)

Original solution:    2.6 ms
Compiled solution:    2.1 µs
Einsum solution:      8.3 µs
Tensordot solution:   36.7 µs

Larger arrays gives:

matrix1=np.random.rand(60,60)
matrix2=np.random.rand(60,60,60,60)

Original solution:    13,3 s
Compiled solution:    18.2 ms
Einsum solution:      115  ms
Tensordot solution:   180  ms

Conclusion

Compilation speeds up the computation by about 3 orders of magnitude and outperforms all other solutions by quite a margin.

Answer 2

Test setup:

In [274]: lis = np.zeros((6,6),int)
In [275]: matrix1 = np.arange(36).reshape(6,6)
In [276]: matrix2 = np.arange(36*36).reshape(6,6,6,6)
In [277]: for i in range(6):
     ...:     for j in range(6):
     ...:         for k in range(6):
     ...:             for l in range(6):
     ...:                 lis[i,j] += matrix1[k,l] * (2 * matrix2[i,j,k,l] - mat
     ...: rix2[i,k,j,l])
     ...:                 
In [278]: lis
Out[278]: 
array([[-51240,  -9660,  31920,  73500, 115080, 156660],
       [ 84840, 126420, 168000, 209580, 251160, 292740],
       [220920, 262500, 304080, 345660, 387240, 428820],
       [357000, 398580, 440160, 481740, 523320, 564900],
       [493080, 534660, 576240, 617820, 659400, 700980],
       [629160, 670740, 712320, 753900, 795480, 837060]])

right?

I'm not sure that tensordot is the right tool; at least may not be the simplest. It certainly can't handle the matrix2 difference.

Let's start with an obvious substitution:

In [279]: matrix3 = 2*matrix2-matrix2.transpose(0,2,1,3)
In [280]: lis = np.zeros((6,6),int)
In [281]: for i in range(6):
     ...:     for j in range(6):
     ...:         for k in range(6):
     ...:             for l in range(6):
     ...:                 lis[i,j] += matrix1[k,l] * matrix3[i,j,k,l]

tests ok - same lis .

Now it is easy to express this with einsum - just replicate the indices

In [284]: np.einsum('kl,ijkl->ij', matrix1, matrix3)
Out[284]: 
array([[-51240,  -9660,  31920,  73500, 115080, 156660],
       [ 84840, 126420, 168000, 209580, 251160, 292740],
       [220920, 262500, 304080, 345660, 387240, 428820],
       [357000, 398580, 440160, 481740, 523320, 564900],
       [493080, 534660, 576240, 617820, 659400, 700980],
       [629160, 670740, 712320, 753900, 795480, 837060]])

elementwise product plus summation on two axes also works; and an equivalent tensordot (specifying which axes to sum over)

(matrix1*matrix3).sum(axis=(2,3))
np.tensordot(matrix1, matrix3, [[0,1],[2,3]])

edit

The newer np.matmul/@ can also be used, but requires some reshaping

In [111]: (matrix1.ravel()[None,None,None,:]@matrix3.reshape(6,6,-1,1)).squeeze(
     ...: )
Out[111]: 
array([[-51240,  -9660,  31920,  73500, 115080, 156660],
       [ 84840, 126420, 168000, 209580, 251160, 292740],
       [220920, 262500, 304080, 345660, 387240, 428820],
       [357000, 398580, 440160, 481740, 523320, 564900],
       [493080, 534660, 576240, 617820, 659400, 700980],
       [629160, 670740, 712320, 753900, 795480, 837060]])

This reduces the kl dimensions down to one, and does 'broadcasting' on the ij dimensions.

Use numpy.tensordot to replace a nested loop

Question

2 answers

solution1
2 ACCPTED 2018-02-27 15:09:40

solution2
1 2018-02-26 22:02:26

edit

Use numpy.tensordot to replace a nested loop

Question

2 answers

solution1 2 ACCPTED 2018-02-27 15:09:40

solution2 1 2018-02-26 22:02:26

edit

solution1
2 ACCPTED 2018-02-27 15:09:40

solution2
1 2018-02-26 22:02:26