[英]Numpy martix multiplication using numpy.einsum with vectorization
我想执行图像的旋转。
开始和正常的形状是 (429, 1024, 3) 腐烂的形状是 (3, 3) 以下代码运行正常,但需要时间才能完成。
#rotation 30 degree
s = numpy.sin(numpy.pi * 30 / 180)
c = numpy.cos(numpy.pi * 30 / 180)
rot = [[1.0, 0.0, 0.0],
[0.0, c, s],
[0.0, -s, c]]
for i in range(height):
for j in range(width):
for arr in [start, norm]:
x = arr[i,j,0]
y = arr[i,j,1]
z = arr[i,j,2]
for d in range(3):
arr[i,j,d] = rot[d][0] * x + rot[d][1] * y + rot[d][2] * z
我试图对代码进行矢量化,但有条件使用 numpy.einsum 来表示每个像素的矢量需要相乘。
#Moving 30 degree
s = numpy.sin(numpy.pi * 30 / 180)
c = numpy.cos(numpy.pi * 30 / 180)
rot = numpy.array([[1.0, 0.0, 0.0], [0.0, c, s], [0.0, -s, c]])
start[:,:,:3] = numpy.einsum('ij,j',rot[:3,0],start[:,:,0]) +
numpy.einsum('ij,j',rot[:3,1],start[:,:,1]) + numpy.einsum('ij,j',rot[:3,2],start[:,:,2])
norm[:,:,:3] = numpy.einsum('ij,j',rot[:3,0],norm[:,:,0]) +
numpy.einsum('ij,j',rot[:3,1],norm[:,:,1]) + numpy.einsum('ij,j',rot[:3,2],norm[:,:,2])
上面的代码给出了错误“爱因斯坦和下标字符串包含太多操作数 0 的下标”。
我应该在代码的矢量化形式中做哪些更改??
从我在你的代码中可以看出,正确的einsum
签名应该是:
start = np.einsum('ijl, kl -> ijk', start, rot)
norm = np.einsum('ijl, kl -> ijk', norm, rot)
但是图片数组的最后一个维度是 RGB 颜色,而不是 XYZ 坐标(如评论中@QuongHoang 指出的那样),因此这不会像您想要的那样“旋转”图片。 你只是在旋转色彩空间
您可以使用下一个简单的代码:
a_ = np.empty_like(a)
for d in range(3):
a_[:, :, d] = rot[d][0] * a[:, :, 0] + rot[d][1] * a[:, :, 1] + rot[d][2] * a[:, :, 2]
a = a_
完整使用示例如下:
import numpy as np
height, width = 100, 179
a = np.random.uniform(0., 1., (height, width, 3))
a0, a1 = np.copy(a), np.copy(a)
s = np.sin(np.pi * 30 / 180)
c = np.cos(np.pi * 30 / 180)
rot = [[1.0, 0.0, 0.0],
[0.0, c, s],
[0.0, -s, c]]
# -------- Version 1, non-vectorized --------------
for i in range(height):
for j in range(width):
for arr in [a0]:
x = arr[i,j,0]
y = arr[i,j,1]
z = arr[i,j,2]
for d in range(3):
arr[i,j,d] = rot[d][0] * x + rot[d][1] * y + rot[d][2] * z
# -------- Version 2, vectorized --------------
a1_ = np.empty_like(a1)
for d in range(3):
a1_[:, :, d] = rot[d][0] * a1[:, :, 0] + rot[d][1] * a1[:, :, 1] + rot[d][2] * a1[:, :, 2]
a1 = a1_
# ------ Check that we have same solution --------
assert np.allclose(a0, a1)
下面这两个解决方案的时间测量代码,矢量化解决方案似乎比非矢量化解决方案快48x
倍,代码需要通过python -m pip install numpy timerit
一次模块:
# Needs: python -m pip install numpy timerit
import numpy as np
s = np.sin(np.pi * 30 / 180)
c = np.cos(np.pi * 30 / 180)
rot = [[1.0, 0.0, 0.0],
[0.0, c, s],
[0.0, -s, c]]
# -------- Version 1, non-vectorized --------------
def f0(a):
height, width = a.shape[:2]
a = np.copy(a)
for i in range(height):
for j in range(width):
for arr in [a]:
x = arr[i,j,0]
y = arr[i,j,1]
z = arr[i,j,2]
for d in range(3):
arr[i,j,d] = rot[d][0] * x + rot[d][1] * y + rot[d][2] * z
return a
# -------- Version 2, vectorized --------------
def f1(a):
height, width = a.shape[:2]
a_ = np.empty_like(a)
for d in range(3):
a_[:, :, d] = rot[d][0] * a[:, :, 0] + rot[d][1] * a[:, :, 1] + rot[d][2] * a[:, :, 2]
a = a_
return a
# ----------- Time/Speedup Measure ----------------
from timerit import Timerit
Timerit._default_asciimode = True
height, width = 100, 179
a = np.random.uniform(0., 1., (height, width, 3))
ra, rt = None, None
for f in [f0, f1]:
print(f'{f.__name__}: ', end = '', flush = True)
tim = Timerit(num = 15, verbose = 1)
for t in tim:
ca = f(a)
ct = tim.mean()
if ra is None:
ra, rt = ca, ct
else:
assert np.allclose(ra, ca)
print(f'speedup {round(rt / ct, 2)}x')
输出:
f0: Timed best=144.785 ms, mean=146.205 +- 1.9 ms
f1: Timed best=2.781 ms, mean=3.022 +- 0.1 ms
speedup 48.38x
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.