[英]Compute rotation matrices from arrays of “aim” and “up” vectors
I want to compute an array of rotation matrices from given arrays of aim
and up
vectors. 我想从给定的
aim
向量和up
向量数组计算旋转矩阵数组。
For simplicity I'll assume the aim
axes will correspond to the matrices's x
components, and up
axes to the matrices's y
components. 为简单起见,我假设
aim
轴将对应于矩阵的x
分量,而up
轴将对应于矩阵的y
分量。
The only way that i know of is by doing a series of cross products: 我知道的唯一方法是做一系列交叉产品:
import cProfile
import numpy as np
from numpy.core.umath_tests import inner1d
normalize = lambda V: V/(inner1d(V,V)**0.5)[:,np.newaxis] # inner1d is faster than np.linalg.norm on large arrays
def vectorToMatrix(X,Y):
X = normalize(X) # make sure X is normalized
Z = normalize(np.cross(X,Y)) # Z is the normal to the XY plane
# Re-adjust Y to keep matrices orthogonal
Y = np.cross(Z,X)
return np.dstack((X,Y,Z)).swapaxes(2,1)
Running this on a million random items 对一百万个随机项目运行
np.random.seed(30)
n = 10**6
X = np.random.random((n,3))
Y = np.random.random((n,3))
M = vectorToMatrix(X,Y)
cProfile.run('vectorToMatrix(X,Y)') # 61 function calls in 0.255 seconds
I'm looking for other methods, preferably leveraging numpy/scipy
, that could help calculate the same result given by vectorToMatrix
with a performance boost. 我正在寻找其他方法,最好利用
numpy/scipy
,可以帮助计算由vectorToMatrix
给出的相同结果, vectorToMatrix
提高性能。
So here's my attempt with @jit
所以这是我尝试使用
@jit
from numba import jit, float64 as float
@jit(float[:,:](float[:,:], float[:,:]), nopython=True)
def crossj(a, b):
c = np.empty(a.shape)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
c[i, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
return c
This is quite a bit faster than @Divakar 这比@Divakar快很多
np.allclose(np.cross(X,Y), crossj(X,Y) )
True
%timeit np.cross(X,Y)
%timeit numpy_cross_slicing(X,Y)
%timeit crossj(X,Y)
10 loops, best of 3: 75.1 ms per loop
10 loops, best of 3: 88.1 ms per loop
10 loops, best of 3: 23 ms per loop
I'll swipe the ne
-based normalize
from @Divakar, and implement a seocnd jit
for normalize(np.cross())
我将从@Divakar滑动基于
ne
的normalize
,并实现seocnd jit
进行normalize(np.cross())
@jit(float[:,:](float[:,:], float[:,:]), nopython=True)
def norm_crossj(a, b):
c = np.empty(a.shape)
for i in range(a.shape[0]):
n = 0
for j in range(a.shape[1]):
c[i, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
n += c[i,j]**2
n = sqrt(n)
for j in range(a.shape[1]):
c[i,j] /= n
return c
Once again, faster 再一次,更快
%timeit normalize(np.cross(X,Y))
%timeit numpy_cross_norm_slicing(X,Y)
%timeit norm_crossj(X,Y)
10 loops, best of 3: 119 ms per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 36.7 ms per loop
Finally: 最后:
def vectorToMatrixj(X, Y):
normalize_einsum_numexpr(X) # make sure X is normalized
#normalize_einsum_numexpr(Y) # you don't need this
Z = norm_crossj(X, Y) # Z is the normal to the XY plane
# Re-adjust Y to keep matrices orthogonal
Y = crossj3(Z, X)
return np.dstack((X,Y,Z)).swapaxes(2,1)
Not sure why @Divakar's timings seem to be so different, or why my speedups don't help more, but: 不知道为什么@Divakar的时间安排看起来如此不同,或者为什么我的提速没有太大帮助,但是:
%timeit vectorToMatrix(X,Y)
%timeit vectorToMatrix1(X,Y) #Divakar
%timeit vectorToMatrix2(X,Y) #Divakar
%timeit vectorToMatrixj(X,Y)
1 loop, best of 3: 265 ms per loop
1 loop, best of 3: 319 ms per loop
1 loop, best of 3: 258 ms per loop
1 loop, best of 3: 212 ms per loop
EDIT: fully jit
ted function: 编辑:完全
jit
功能:
@jit(float[:,:,:](float[:,:], float[:,:]), nopython=True)
def vec2matj(a, b):
c = np.empty(a.shape + a.shape[-1:])
for i in range(a.shape[0]):
na = 0
nc = 0
for j in range(a.shape[1]):
c[i, 2, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
na += a[i, j]**2
nc += c[i, 2, j]**2
na = sqrt(na)
nc = sqrt(nc)
for j in range(a.shape[1]):
c[i, 2, j] /= nc
c[i, 0, j] = a[i, j] / na
for j in range(a.shape[1]):
c[i, 1, j] = c[i, 2, j-2] * c[i, 0, j-1] - c[i, 2, j-1] * c[i, 0, j-2]
return c
np.allclose(vectorToMatrix(X,Y), vec2matj(X,Y))
True
%timeit vec2matj(X,Y)
%timeit vectorToMatrix(X,Y)
10 loops, best of 3: 60.8 ms per loop
1 loop, best of 3: 240 ms per loop # <- different computer than timings above
Three stages of marginal improvements are possible with tricks from np.einsum
and numexpr
module. 使用
np.einsum
和numexpr
模块的技巧可以实现三个阶段的边缘改进。
Stage #1 : Compute normalize outputs using sum-reduction
with einsum
and then leveraging numexpr
for performing squared-roots - 阶段1:使用
einsum
和sum-reduction
来计算归一化输出,然后利用numexpr
执行平方根-
import numexpr as ne
def normalize_einsum_numexpr(X):
sq_sums = np.einsum('ij,ij->i',X,X)[:,None]
return ne.evaluate('X/sqrt(sq_sums)')
This would be equivalent of normalize(X)
. 这等效于
normalize(X)
。
Stage #2 : Get numpy.cross equivalent with slicing using the definition of cross-product
- 第2阶段:使用
cross-product
的定义,通过切片获得numpy.cross等效项-
def numpy_cross_slicing(X,Y):
c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
return np.column_stack((c0,c1,c2))
Stage #3 : Get normalized cross product with slicing and also leveraging numexpr
- 第3阶段:通过切片并利用
numexpr
获得标准化的叉积-
def numpy_cross_norm_slicing(X,Y):
c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
s = ne.evaluate('sqrt(c0**2 + c1**2 + c2**2)')
c0 /= s
c1 /= s
c2 /= s
return np.column_stack((c0,c1,c2))
This would replace normalize(np.cross(X,Y))
. 这将替换
normalize(np.cross(X,Y))
。
Putting it all together, we would have the replacement for vectorToMatrix
, like so - 放在一起,我们将替换
vectorToMatrix
,就像这样-
def vectorToMatrix1(X,Y):
X = normalize_einsum_numexpr(X) # make sure X is normalized
Y = normalize_einsum_numexpr(Y) # make sure Y is normalized
Z = numpy_cross_norm_slicing(X,Y) # Z is the normal ...
Y = numpy_cross_slicing(Z,X)
return np.dstack((X,Y,Z)).swapaxes(2,1)
Runtime test 运行时测试
Input setup : 输入设置:
In [271]: X = np.random.random((10**6,3))
...: Y = np.random.random((10**6,3))
...:
Stage #1 : 阶段1 :
In [272]: np.allclose(normalize(X), normalize_einsum_numexpr(X))
Out[272]: True
In [273]: %timeit normalize(X)
...: %timeit normalize_einsum_numexpr(X)
...:
100 loops, best of 3: 11.4 ms per loop
100 loops, best of 3: 10.6 ms per loop
Stage #2 : 第二阶段:
In [274]: np.allclose(np.cross(X,Y), numpy_cross_slicing(X,Y) )
Out[274]: True
In [275]: %timeit np.cross(X,Y)
...: %timeit numpy_cross_slicing(X,Y)
...:
10 loops, best of 3: 29.8 ms per loop
10 loops, best of 3: 27.9 ms per loop
Stage #3 : 第三阶段:
In [276]: np.allclose(normalize(np.cross(X,Y)), numpy_cross_norm_slicing(X,Y))
Out[276]: True
In [277]: %timeit normalize(np.cross(X,Y))
...: %timeit numpy_cross_norm_slicing(X,Y)
...:
10 loops, best of 3: 44.5 ms per loop
10 loops, best of 3: 34.9 ms per loop
Entire code : 完整代码:
In [395]: np.allclose(vectorToMatrix(X,Y), vectorToMatrix1(X,Y))
Out[395]: True
In [396]: %timeit vectorToMatrix(X,Y)
10 loops, best of 3: 130 ms per loop
In [397]: %timeit vectorToMatrix1(X,Y)
10 loops, best of 3: 122 ms per loop
Hence, some marginal improvement only. 因此,仅有些改进。
Not giving up! 不放弃!
Looking into the bottlenecks, the many stacking steps weren't helping. 纵观瓶颈,许多堆叠步骤无济于事。 So, improving on those, a modified version ended up like this -
因此,对这些内容进行改进后,最终得到了这样的修改版本-
def vectorToMatrix2(X,Y):
X = normalize_einsum_numexpr(X) # make sure X is normalized
Y = normalize_einsum_numexpr(Y) # make sure Y is normalized
c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
s = ne.evaluate('sqrt(c0**2 + c1**2 + c2**2)')
c0 /= s
c1 /= s
c2 /= s
d0 = c1*X[:,2] - c2*X[:,1]
d1 = c2*X[:,0] - c0*X[:,2]
d2 = c0*X[:,1] - c1*X[:,0]
c = [c0,c1,c2]
d = [d0,d1,d2]
return np.concatenate((X.T, d, c)).reshape(3,3,-1).transpose(2,0,1)
New timings with same million points setup - 设置相同百万积分的新计时-
In [505]: %timeit vectorToMatrix(X,Y) # original code
...: %timeit vectorToMatrix1(X,Y)
...: %timeit vectorToMatrix2(X,Y)
...:
10 loops, best of 3: 130 ms per loop
10 loops, best of 3: 117 ms per loop
10 loops, best of 3: 101 ms per loop
20%+
speedup, not too bad! 加速
20%+
还算不错!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.