简体   繁体   English

如何根据其他基本矩阵和相机运动找到基本矩阵?

[英]How to find fundamental matrix based on other fundamental matrix and camera movement?

I am trying to speed up some multi-camera system that relies on calculation of fundamental matrices between each camera pair.我正在尝试加速一些依赖于计算每个相机对之间的基本矩阵的多相机系统。

Please notice the following is pseudocode.请注意以下是伪代码。 @ means matrix multiplication, | @表示矩阵乘法, | means concatenation.是串联的意思。

I have code to calculate F for each pair calculate_f(camera_matrix1_3x4, camera_matrix1_3x4) , and the naiive solution is我有代码来计算F为每对calculate_f(camera_matrix1_3x4, camera_matrix1_3x4) ,天真的解决方案是

for c1 in cameras:
    for c2 in cameras:
        if c1 != c2:
            f = calculate_f(c1.proj_matrix, c2.proj_matrix)

This is slow, and I would like to speed it up.这很慢,我想加快速度。 I have ~5000 cameras.我有大约 5000 台相机。


I have pre calculated all rotations and translations (in world coordinates) between every pair of cameras, and internal parameters k , such that for each camera c , it holds that c.matrix = c.k @ (c.rot | c.t) I have pre calculated all rotations and translations (in world coordinates) between every pair of cameras, and internal parameters k , such that for each camera c , it holds that c.matrix = c.k @ (c.rot | c.t)

Can I use the parameters r, t to help speed up following calculations for F ?我可以使用参数r, t来帮助加快F的以下计算吗?


In mathematical form, for 3 different cameras c1 , c2 , c3 I have以数学形式,对于 3 个不同的相机c1c2c3我有

f12=(c1.proj_matrix, c2.proj_matrix) , and I want f23=(c2.proj_matrix, c3.proj_matrix) , f13=(c1.proj_matrix, c3.proj_matrix) with some function f23, f13 = fast_f(f12, c1.r, c1.t, c2.r, c2.t, c3.r, c3.t) ? f12=(c1.proj_matrix, c2.proj_matrix) ,我想要f23=(c2.proj_matrix, c3.proj_matrix) , f13=(c1.proj_matrix, c3.proj_matrix)和一些 function f23 f23, f13 = fast_f(f12, c1.r, c1.t, c2.r, c2.t, c3.r, c3.t)


A working function for calculating the fundamental matrix in numpy:一个工作 function 用于计算 numpy 中的基本矩阵:

def fundamental_3x3_from_projections(p_left_3x4: np.array, p_right__3x4: np.array) -> np.array:
    # The following is based on OpenCv-contrib's c++ implementation.
    # see https://github.com/opencv/opencv_contrib/blob/master/modules/sfm/src/fundamental.cpp#L109
    # see https://sourishghosh.com/2016/fundamental-matrix-from-camera-matrices/
    # see https://answers.opencv.org/question/131017/how-do-i-compute-the-fundamental-matrix-from-2-projection-matrices/
    f_3x3 = np.zeros((3, 3))
    p1, p2 = p_left_3x4, p_right__3x4

    x = np.empty((3, 2, 4), dtype=np.float)
    x[0, :, :] = np.vstack([p1[1, :], p1[2, :]])
    x[1, :, :] = np.vstack([p1[2, :], p1[0, :]])
    x[2, :, :] = np.vstack([p1[0, :], p1[1, :]])

    y = np.empty((3, 2, 4), dtype=np.float)
    y[0, :, :] = np.vstack([p2[1, :], p2[2, :]])
    y[1, :, :] = np.vstack([p2[2, :], p2[0, :]])
    y[2, :, :] = np.vstack([p2[0, :], p2[1, :]])

    for i in range(3):
        for j in range(3):
            xy = np.vstack([x[j, :], y[i, :]])
            f_3x3[i, j] = np.linalg.det(xy)

    return f_3x3

Numpy is clearly not optimized for working on small matrices . Numpy 显然没有针对小型矩阵进行优化 The parsing of CPython input objects, internal checks and function calls introduce a significant overhead which is far bigger than the execution time need to perform the actual computation. CPython 输入对象的解析、内部检查和 function 调用引入了显着的开销,远远大于执行实际计算所需的执行时间。 Not to mention the creation of many temporary arrays is also expensive.更不用说创建许多临时 arrays也很昂贵。 One solution to solve this problem is to use Numba or Cython .解决此问题的一种解决方案是使用 Numba 或 Cython

Moreover, the computation of the determinant can be optimized a lot since you know the exact size of the matrix and a part of the matrix does not always change.此外,行列式的计算可以优化很多,因为您知道矩阵的确切大小,并且矩阵的一部分并不总是改变。 Indeed, using a basic algebraic expression for the 4x4 determinant help compilers to optimize a lot the overall computation thanks to the common sub-expression elimination (not performed by the CPython interpreter) and the removal of complex loops/conditionals in np.linalg.det .实际上,由于 通用子表达式消除(不是由 CPython 解释器执行)和np.linalg.det中复杂循环/条件的删除,使用4x4 行列式的基本代数表达式有助于编译器优化整体计算.

Here is the resulting code:这是生成的代码:

import numba as nb

@nb.njit('float64(float64[:,::1])')
def det_4x4(mat):
    a, b, c, d = mat[0,0], mat[0,1], mat[0,2], mat[0,3]
    e, f, g, h = mat[1,0], mat[1,1], mat[1,2], mat[1,3]
    i, j, k, l = mat[2,0], mat[2,1], mat[2,2], mat[2,3]
    m, n, o, p = mat[3,0], mat[3,1], mat[3,2], mat[3,3]
    return a * (f * (k*p - l*o) + g * (l*n - j*p) + h * (j*o - k*n)) + \
            b * (e * (l*o - k*p) + g * (i*p - l*m) + h * (k*m - i*o)) + \
            c * (e * (j*p - l*n) + f * (l*m - i*p) + h * (i*n - j*m)) + \
            d * (e * (k*n - j*o) + f * (i*o - k*m) + g * (j*m - i*n))

@nb.njit('float64[:,::1](float64[:,::1], float64[:,::1])')
def fundamental_3x3_from_projections(p_left_3x4, p_right_3x4):
    f_3x3 = np.empty((3, 3))
    p1, p2 = p_left_3x4, p_right_3x4

    x = np.empty((3, 2, 4), dtype=np.float64)
    x[0, 0, :] = p1[1, :]
    x[0, 1, :] = p1[2, :]
    x[1, 0, :] = p1[2, :]
    x[1, 1, :] = p1[0, :]
    x[2, 0, :] = p1[0, :]
    x[2, 1, :] = p1[1, :]

    y = np.empty((3, 2, 4), dtype=np.float64)
    y[0, 0, :] = p2[1, :]
    y[0, 1, :] = p2[2, :]
    y[1, 0, :] = p2[2, :]
    y[1, 1, :] = p2[0, :]
    y[2, 0, :] = p2[0, :]
    y[2, 1, :] = p2[1, :]

    xy = np.empty((4, 4), dtype=np.float64)

    for i in range(3):
        xy[2:4, :] = y[i, :, :]
        for j in range(3):
            xy[0:2, :] = x[j, :, :]
            f_3x3[i, j] = det_4x4(xy)

    return f_3x3

This is 130 times faster on my machine (85.6 us VS 0.66 us).这在我的机器上快了 130 倍(85.6 us VS 0.66 us)。

You can speed up the process even more by a factor of two if the applied function is commutative (ie. f(c1, c2) == f(c2, c1) ).如果应用的 function 是可交换的(即f(c1, c2) == f(c2, c1) ),您可以将过程加快两倍。 If so, you could compute only the upper part .如果是这样,您可以只计算上半部分 It turns out that your function have some interesting property since f(c1, c2) == f(c2, c1).T appear to be always true.事实证明,您的 function 具有一些有趣的属性,因为f(c1, c2) == f(c2, c1).T似乎总是正确的。 Another possible optimization is to run the loop in parallel .另一种可能的优化是并行运行循环。

With all these optimizations, the resulting program should be about 3 order of magnitude faster .通过所有这些优化,生成的程序应该快大约 3 个数量级

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM