简体繁体 English

稀疏矩阵 Cholesky 分解的复杂性

[英]Complexity of Sparse Matrix Cholesky decomposition

原文 2021-09-30 05:26:47 8 3 python/ julia/ sparse-matrix/ gaussian-process

I am having trouble finding a straightforward answer to the following question:我很难找到以下问题的直接答案：

If you compute the Cholesky decomposition of an nxn positive definite symmetric matrix A, ie factor A=LL^T with L a lower triangular matrix, the complexity is O(n^3).如果您计算 nxn 正定对称矩阵 A 的 Cholesky 分解，即因子 A=LL^T 与 L 下三角矩阵，则复杂度为 O(n^3)。 For sparse matrices, there are apparently faster algorithms, but how much faster?对于稀疏矩阵，显然有更快的算法，但要快多少？

What complexity can we achieve for such a matrix with say m<n^2 nonzero entries?对于具有 m<n^2 个非零项的矩阵，我们可以实现什么复杂度？

Edit : my matrix is also approximately main diagonal (only the diagonal and a some adjacent diagonals below and above are nonzero).编辑：我的矩阵也大致是主对角线（只有对角线和下方和上方的一些相邻对角线非零）。

PS I am eventually interested in implementations in either Julia or Python. PS 我最终对 Julia 或 Python 的实现感兴趣。 Python has the sksparse.cholmod module ( https://scikit-sparse.readthedocs.io/en/latest/cholmod.html ) but it isn't clear to me what algorithm they are using and what its complexity is. Python 有 sksparse.cholmod 模块（ https://scikit-sparse.readthedocs.io/en/latest/cholmod.html ），但我不清楚他们使用的是什么算法以及它的复杂性是什么。 Not sure about Julia, if anyone can tell me.不知道朱莉娅，如果有人能告诉我。

3 个解决方案

Python 库 Numpy（Numerical Python）也有一个 cholesky 模块 - np.linalg.cholesky ，我提供了文档的链接，虽然我不确定这是否回答了问题，可能需要一些实验。

This can only be answered exactly for abitrary matrices if P=NP ... so it's not possible to answer in general.如果 P=NP ，则只能针对任意矩阵准确回答此问题......因此一般无法回答。 The time complexity depends on the fill-reducing ordering used, which is attempting to get an approximate solution to an NP hard problem.时间复杂度取决于所使用的填充减少排序，它试图获得 NP 难题的近似解。

However, for the very special case of a matrix coming from a regular square 2D or 3D mesh, there is an answer.但是，对于来自规则正方形 2D 或 3D 网格的矩阵的非常特殊情况，有一个答案。 In this case, nested dissection gives an ordering that is asymptotically optimal.在这种情况下，嵌套剖析给出了渐近最优的排序。 For a 2D s-by-s mesh, the matrix has dimension n = s^2 and I think about 5n entries.对于 2D s-by-s 网格，矩阵的维度为 n = s^2，我认为大约有 5n 个条目。 In this case, L has 31*(n log2(n)/8)+O(n) nonzeros, and the work is 829*(n^(3/2))/84+O(n log n).在这种情况下，L 有 31*(n log2(n)/8)+O(n) 个非零值，工作量为 829*(n^(3/2))/84+O(n log n)。 For a 3D s-by-s-by-s mesh with n = s^3, there are O(n^(4/3)) nonzeros in L and O(n^2) operations are required to compute L.对于 n = s^3 的 3D s-by-s-by-s 网格，L 中有 O(n^(4/3)) 个非零值，计算 L 需要 O(n^2) 个操作。

It's worth looking at an incomplete Cholesky decomposition, which there multiple variations of but typically either only compute the entries in the triangular factor that are nonzero in the input, or use a low rank approximation of the decomposition.值得一看的是不完整的 Cholesky 分解，它有多种变体，但通常要么只计算输入中非零的三角形因子中的条目，要么使用分解的低秩近似。 It's not clear from your question why you are interested in the asymptotic complexity but with the Gaussian process tag I can guess that you're decomposing a covariance kernel matrix and repeatedly solving a linear system during inference.从您的问题中不清楚为什么您对渐近复杂性感兴趣，但是使用高斯过程标签，我可以猜测您正在分解协方差核矩阵并在推理过程中反复求解线性系统。 The incomplete factorization is most often used as a preconditioner in these kinds of applications- while it's not exact, it's very efficient, easy to update incrementally, and can greatly accelerate solvers.在这些类型的应用程序中，不完全分解最常用作预处理器 - 虽然它不准确，但它非常有效，易于增量更新，并且可以大大加速求解器。

In application, however, the answer to your question is very, very likely to be irrelevant to the approach that performs best for your purposes.但是，在应用中，您的问题的答案很可能与最适合您的目的的方法无关。 There are $O(n log(n)^k)$ time algorithms for incomplete factorizations that are of no practical use for the same reasons as the algorithms for matrix multiplication with the lowest exponents.有用于不完全分解的 $O(n log(n)^k)$ 时间算法，由于与具有最低指数的矩阵乘法算法相同的原因，这些算法没有实际用途。 Knowing what precisely you need for your application will inform your options but you already have far and away the best tools to find out- taking a few minutes writing some code to generate synthetic data or sample varying sizes of your real data and timing it in the same way that you're interested in for application.了解您的应用程序确切需要什么将告知您的选择，但您已经拥有了最好的工具来找出答案——花几分钟时间编写一些代码来生成合成数据或对不同大小的真实数据进行采样，并在与您感兴趣的申请方式相同。 If you're doing one factorization and solving many systems, the time to do the factorization is likely going to be dwarfed by the solving.如果您正在执行一个分解并解决多个系统，那么分解的时间可能会因求解而显得相形见绌。 If you are doing many factorizations, especially from scratch each time, the constant and linear factors of the running time will have even more impact.如果你做很多分解，尤其是每次都从头开始，运行时间的常数和线性因子会产生更大的影响。 Plus, the sparsify patterns and kernel itself can have an enormous impact on the performance with the same algorithm.另外，稀疏模式和内核本身会对使用相同算法的性能产生巨大影响。 It's not difficult to construct kernels that have fully dense covariance matrices, tridiagonal precision matrices, and a Cholesky factorization that is just the lower triangular portion of the matrix exactly up to scaling by a diagonal matrix (an exponential kernel in 1d has all 3 simultaneously).构造具有完全密集协方差矩阵、三对角精度矩阵和 Cholesky 分解的内核并不困难，该 Cholesky 分解只是矩阵的下三角部分，恰好可以通过对角矩阵进行缩放（1d 中的指数内核同时具有所有 3 个） . Profile first, optimize last.首先配置文件，最后优化。