根据“目标”和“上”向量数组计算旋转矩阵

Question

I want to compute an array of rotation matrices from given arrays of aim and up vectors. 我想从给定的aim向量和up向量数组计算旋转矩阵数组。

For simplicity I'll assume the aim axes will correspond to the matrices's x components, and up axes to the matrices's y components. 为简单起见，我假设aim轴将对应于矩阵的x分量，而up轴将对应于矩阵的y分量。

The only way that i know of is by doing a series of cross products: 我知道的唯一方法是做一系列交叉产品：

import cProfile
import numpy as np
from numpy.core.umath_tests import inner1d

normalize = lambda V: V/(inner1d(V,V)**0.5)[:,np.newaxis] # inner1d is faster than np.linalg.norm on large arrays

def vectorToMatrix(X,Y):
    X = normalize(X) # make sure X is normalized
    Z = normalize(np.cross(X,Y)) # Z is the normal to the XY plane

    # Re-adjust Y to keep matrices orthogonal
    Y = np.cross(Z,X)
    return np.dstack((X,Y,Z)).swapaxes(2,1)

Running this on a million random items 对一百万个随机项目运行

np.random.seed(30)
n = 10**6
X = np.random.random((n,3))
Y = np.random.random((n,3))
M = vectorToMatrix(X,Y)

cProfile.run('vectorToMatrix(X,Y)') # 61 function calls in 0.255 seconds

I'm looking for other methods, preferably leveraging numpy/scipy , that could help calculate the same result given by vectorToMatrix with a performance boost. 我正在寻找其他方法，最好利用numpy/scipy ，可以帮助计算由vectorToMatrix给出的相同结果， vectorToMatrix提高性能。

Answer 1

So here's my attempt with @jit 所以这是我尝试使用@jit

from numba import jit, float64 as float

@jit(float[:,:](float[:,:], float[:,:]), nopython=True)
def crossj(a, b):
    c = np.empty(a.shape)
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            c[i, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
    return c

This is quite a bit faster than @Divakar 这比@Divakar快很多

np.allclose(np.cross(X,Y), crossj(X,Y) )

True

%timeit np.cross(X,Y)
%timeit numpy_cross_slicing(X,Y)
%timeit crossj(X,Y)

10 loops, best of 3: 75.1 ms per loop
10 loops, best of 3: 88.1 ms per loop
10 loops, best of 3: 23 ms per loop

I'll swipe the ne -based normalize from @Divakar, and implement a seocnd jit for normalize(np.cross()) 我将从@Divakar滑动基于ne的normalize ，并实现seocnd jit进行normalize(np.cross())

@jit(float[:,:](float[:,:], float[:,:]), nopython=True)
def norm_crossj(a, b):
    c = np.empty(a.shape)
    for i in range(a.shape[0]):
        n = 0
        for j in range(a.shape[1]):
            c[i, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
            n += c[i,j]**2
        n = sqrt(n)
        for j in range(a.shape[1]):
            c[i,j] /= n
    return c

Once again, faster 再一次，更快

%timeit normalize(np.cross(X,Y))
%timeit numpy_cross_norm_slicing(X,Y)
%timeit norm_crossj(X,Y)

10 loops, best of 3: 119 ms per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 36.7 ms per loop

Finally: 最后：

def vectorToMatrixj(X, Y):
    normalize_einsum_numexpr(X) # make sure X is normalized
    #normalize_einsum_numexpr(Y) # you don't need this
    Z = norm_crossj(X, Y) # Z is the normal to the XY plane
    # Re-adjust Y to keep matrices orthogonal
    Y = crossj3(Z, X)
    return np.dstack((X,Y,Z)).swapaxes(2,1)

Not sure why @Divakar's timings seem to be so different, or why my speedups don't help more, but: 不知道为什么@Divakar的时间安排看起来如此不同，或者为什么我的提速没有太大帮助，但是：

%timeit vectorToMatrix(X,Y)
%timeit vectorToMatrix1(X,Y) #Divakar
%timeit vectorToMatrix2(X,Y) #Divakar
%timeit vectorToMatrixj(X,Y)

1 loop, best of 3: 265 ms per loop
1 loop, best of 3: 319 ms per loop
1 loop, best of 3: 258 ms per loop
1 loop, best of 3: 212 ms per loop

EDIT: fully jit ted function: 编辑：完全jit功能：

@jit(float[:,:,:](float[:,:], float[:,:]), nopython=True)
def vec2matj(a, b):
    c = np.empty(a.shape + a.shape[-1:])
    for i in range(a.shape[0]):
        na = 0
        nc = 0
        for j in range(a.shape[1]):
            c[i, 2, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
            na += a[i, j]**2
            nc += c[i, 2, j]**2
        na = sqrt(na)
        nc = sqrt(nc)
        for j in range(a.shape[1]):
            c[i, 2, j] /= nc
            c[i, 0, j] = a[i, j] / na
        for j in range(a.shape[1]):
            c[i, 1, j] = c[i, 2, j-2] * c[i, 0, j-1] - c[i, 2, j-1] * c[i, 0, j-2]
    return c

np.allclose(vectorToMatrix(X,Y), vec2matj(X,Y))
True

%timeit vec2matj(X,Y)
%timeit vectorToMatrix(X,Y)

10 loops, best of 3: 60.8 ms per loop
1 loop, best of 3: 240 ms per loop   # <- different computer than timings above

Answer 2

Three stages of marginal improvements are possible with tricks from np.einsum and numexpr module. 使用np.einsum和numexpr模块的技巧可以实现三个阶段的边缘改进。

Stage #1 : Compute normalize outputs using sum-reduction with einsum and then leveraging numexpr for performing squared-roots - 阶段1：使用einsum和sum-reduction来计算归一化输出，然后利用numexpr执行平方根-

import numexpr as ne

def normalize_einsum_numexpr(X):
    sq_sums = np.einsum('ij,ij->i',X,X)[:,None]
    return ne.evaluate('X/sqrt(sq_sums)')

This would be equivalent of normalize(X) . 这等效于normalize(X) 。

Stage #2 : Get numpy.cross equivalent with slicing using the definition of cross-product - 第2阶段：使用cross-product的定义，通过切片获得numpy.cross等效项-

def numpy_cross_slicing(X,Y):
    c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
    c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
    c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
    return np.column_stack((c0,c1,c2))

Stage #3 : Get normalized cross product with slicing and also leveraging numexpr - 第3阶段：通过切片并利用numexpr获得标准化的叉积-

def numpy_cross_norm_slicing(X,Y):
    c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
    c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
    c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]

    s = ne.evaluate('sqrt(c0**2 + c1**2 + c2**2)')
    c0 /= s
    c1 /= s
    c2 /= s
    return np.column_stack((c0,c1,c2))

This would replace normalize(np.cross(X,Y)) . 这将替换normalize(np.cross(X,Y)) 。

Putting it all together, we would have the replacement for vectorToMatrix , like so - 放在一起，我们将替换vectorToMatrix ，就像这样-

def vectorToMatrix1(X,Y):
    X = normalize_einsum_numexpr(X) # make sure X is normalized
    Y = normalize_einsum_numexpr(Y) # make sure Y is normalized
    Z = numpy_cross_norm_slicing(X,Y) # Z is the normal ...
    Y = numpy_cross_slicing(Z,X)
    return np.dstack((X,Y,Z)).swapaxes(2,1)

Runtime test 运行时测试

Input setup : 输入设置：

In [271]: X = np.random.random((10**6,3))
     ...: Y = np.random.random((10**6,3))
     ...:

Stage #1 : 阶段1 ：

In [272]: np.allclose(normalize(X), normalize_einsum_numexpr(X))
Out[272]: True

In [273]: %timeit normalize(X)
     ...: %timeit normalize_einsum_numexpr(X)
     ...: 
100 loops, best of 3: 11.4 ms per loop
100 loops, best of 3: 10.6 ms per loop

Stage #2 : 第二阶段：

In [274]: np.allclose(np.cross(X,Y), numpy_cross_slicing(X,Y) )
Out[274]: True

In [275]: %timeit np.cross(X,Y)
     ...: %timeit numpy_cross_slicing(X,Y)
     ...: 
10 loops, best of 3: 29.8 ms per loop
10 loops, best of 3: 27.9 ms per loop

Stage #3 : 第三阶段：

In [276]: np.allclose(normalize(np.cross(X,Y)), numpy_cross_norm_slicing(X,Y))
Out[276]: True

In [277]: %timeit normalize(np.cross(X,Y))
     ...: %timeit numpy_cross_norm_slicing(X,Y)
     ...: 
10 loops, best of 3: 44.5 ms per loop
10 loops, best of 3: 34.9 ms per loop

Entire code : 完整代码：

In [395]: np.allclose(vectorToMatrix(X,Y), vectorToMatrix1(X,Y))
Out[395]: True

In [396]: %timeit vectorToMatrix(X,Y)
10 loops, best of 3: 130 ms per loop

In [397]: %timeit vectorToMatrix1(X,Y)
10 loops, best of 3: 122 ms per loop

Hence, some marginal improvement only. 因此，仅有些改进。

Not giving up! 不放弃！

Looking into the bottlenecks, the many stacking steps weren't helping. 纵观瓶颈，许多堆叠步骤无济于事。 So, improving on those, a modified version ended up like this - 因此，对这些内容进行改进后，最终得到了这样的修改版本-

def vectorToMatrix2(X,Y):
    X = normalize_einsum_numexpr(X) # make sure X is normalized
    Y = normalize_einsum_numexpr(Y) # make sure Y is normalized

    c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
    c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
    c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]

    s = ne.evaluate('sqrt(c0**2 + c1**2 + c2**2)')
    c0 /= s
    c1 /= s
    c2 /= s

    d0 = c1*X[:,2] - c2*X[:,1]
    d1 = c2*X[:,0] - c0*X[:,2]
    d2 = c0*X[:,1] - c1*X[:,0]

    c = [c0,c1,c2]
    d = [d0,d1,d2]
    return np.concatenate((X.T, d, c)).reshape(3,3,-1).transpose(2,0,1)

New timings with same million points setup - 设置相同百万积分的新计时-

In [505]: %timeit vectorToMatrix(X,Y) # original code
     ...: %timeit vectorToMatrix1(X,Y)
     ...: %timeit vectorToMatrix2(X,Y)
     ...: 
10 loops, best of 3: 130 ms per loop
10 loops, best of 3: 117 ms per loop
10 loops, best of 3: 101 ms per loop

20%+ speedup, not too bad! 加速20%+还算不错！

根据“目标”和“上”向量数组计算旋转矩阵

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-09-13 09:45:37

解决方案2
1 2017-09-13 05:41:34

根据“目标”和“上”向量数组计算旋转矩阵

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-09-13 09:45:37

解决方案2 1 2017-09-13 05:41:34

解决方案1
2 已采纳 2017-09-13 09:45:37

解决方案2
1 2017-09-13 05:41:34