简体   繁体   中英

Numpy martix multiplication using numpy.einsum with vectorization

I want to perform rotation of image .

Shape of start and normal are (429, 1024, 3) Shape of rot is (3, 3) Following code run properly but take time to complete.

#rotation 30 degree
s = numpy.sin(numpy.pi * 30 / 180)
c = numpy.cos(numpy.pi * 30 / 180)

rot = [[1.0, 0.0, 0.0],
  [0.0,   c,   s],
  [0.0,  -s,   c]]


for i in range(height):
  for j in range(width):
    for arr in [start, norm]:
        x = arr[i,j,0]
        y = arr[i,j,1]
        z = arr[i,j,2]
        for d in range(3):
            arr[i,j,d] = rot[d][0] * x + rot[d][1] * y + rot[d][2] * z

I tried to vectorized the code but there is condition to use numpy.einsum for vector of each pixel need to be multiplied.

#Moving 30 degree
s = numpy.sin(numpy.pi * 30 / 180)
c = numpy.cos(numpy.pi * 30 / 180)

rot = numpy.array([[1.0, 0.0, 0.0], [0.0,   c,   s], [0.0,  -s,   c]])
 
start[:,:,:3] = numpy.einsum('ij,j',rot[:3,0],start[:,:,0]) + 
 numpy.einsum('ij,j',rot[:3,1],start[:,:,1]) + numpy.einsum('ij,j',rot[:3,2],start[:,:,2])

norm[:,:,:3] = numpy.einsum('ij,j',rot[:3,0],norm[:,:,0]) + 
 numpy.einsum('ij,j',rot[:3,1],norm[:,:,1]) + numpy.einsum('ij,j',rot[:3,2],norm[:,:,2])

The above code gives error "einstein sum subscripts string contains too many subscripts for operand 0".

What changes should i do in the vectorized form of code ??

From what I can tell in your code, the correct einsum signature should be:

start = np.einsum('ijl, kl -> ijk', start, rot)
norm  = np.einsum('ijl, kl -> ijk',  norm, rot)

But the last dimension of a picture array is the RGB color, not the XYZ coordinate (as pointed out be @QuongHoang in the comments), so this won't "rotate" the picture as you seem to want. You'll just be rotating the color space

You can use next simple code:

a_ = np.empty_like(a)
for d in range(3):
    a_[:, :, d] = rot[d][0] * a[:, :, 0] + rot[d][1] * a[:, :, 1] + rot[d][2] * a[:, :, 2]
a = a_

Full usage example below:

Try it online!

import numpy as np

height, width = 100, 179
a = np.random.uniform(0., 1., (height, width, 3))
a0, a1 = np.copy(a), np.copy(a)

s = np.sin(np.pi * 30 / 180)
c = np.cos(np.pi * 30 / 180)

rot = [[1.0, 0.0, 0.0],
  [0.0,   c,   s],
  [0.0,  -s,   c]]

# -------- Version 1, non-vectorized --------------

for i in range(height):
  for j in range(width):
    for arr in [a0]:
        x = arr[i,j,0]
        y = arr[i,j,1]
        z = arr[i,j,2]
        for d in range(3):
            arr[i,j,d] = rot[d][0] * x + rot[d][1] * y + rot[d][2] * z

# -------- Version 2, vectorized --------------

a1_ = np.empty_like(a1)
for d in range(3):
    a1_[:, :, d] = rot[d][0] * a1[:, :, 0] + rot[d][1] * a1[:, :, 1] + rot[d][2] * a1[:, :, 2]
a1 = a1_

# ------ Check that we have same solution --------
assert np.allclose(a0, a1)

Time measurement code for these two solutions down below, vectorized solution appears to be 48x times faster than non-vectorized, code needs installing modules one time through python -m pip install numpy timerit :

# Needs: python -m pip install numpy timerit

import numpy as np

s = np.sin(np.pi * 30 / 180)
c = np.cos(np.pi * 30 / 180)

rot = [[1.0, 0.0, 0.0],
  [0.0,   c,   s],
  [0.0,  -s,   c]]

# -------- Version 1, non-vectorized --------------

def f0(a):
    height, width = a.shape[:2]
    a = np.copy(a)
    for i in range(height):
      for j in range(width):
        for arr in [a]:
            x = arr[i,j,0]
            y = arr[i,j,1]
            z = arr[i,j,2]
            for d in range(3):
                arr[i,j,d] = rot[d][0] * x + rot[d][1] * y + rot[d][2] * z
    return a

# -------- Version 2, vectorized --------------

def f1(a):
    height, width = a.shape[:2]
    a_ = np.empty_like(a)
    for d in range(3):
        a_[:, :, d] = rot[d][0] * a[:, :, 0] + rot[d][1] * a[:, :, 1] + rot[d][2] * a[:, :, 2]
    a = a_
    return a

# ----------- Time/Speedup Measure ----------------

from timerit import Timerit
Timerit._default_asciimode = True

height, width = 100, 179
a = np.random.uniform(0., 1., (height, width, 3))

ra, rt = None, None
for f in [f0, f1]:
    print(f'{f.__name__}: ', end = '', flush = True)
    tim = Timerit(num = 15, verbose = 1)
    for t in tim:    
        ca = f(a)
    ct = tim.mean()
    if ra is None:
        ra, rt = ca, ct
    else:
        assert np.allclose(ra, ca)
        print(f'speedup {round(rt / ct, 2)}x')

Output:

f0: Timed best=144.785 ms, mean=146.205 +- 1.9 ms
f1: Timed best=2.781 ms, mean=3.022 +- 0.1 ms
speedup 48.38x

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM