简体繁体 English

用Python / Scipy对角大型稀疏矩阵

[英]Diagonalizing large sparse matrix with Python/Scipy

原文 2013-10-04 20:15:05 3 1 python/ scipy/ sparse-matrix

I am working with a large (complex) Hermitian matrix and I am trying to diagonalize it efficiently using Python/Scipy. 我正在使用大型（复杂的）埃尔米特矩阵，并且尝试使用Python / Scipy有效地对角化它。

Using the eigh function from scipy.linalg it takes about 3s to generate and diagonalize a roughly 800x800 matrix and compute all the eigenvalues and eigenvectors. 使用eigh函数， scipy.linalg需要3s的时间来生成和对角化大约800x800的矩阵，并计算所有特征值和特征向量。

The eigenvalues in my problem are symmetrically distributed around 0 and range from roughly -4 to 4. I only need the eigenvectors corresponding to the negative eigenvalues, though, which turns the range I am looking to calculate into [-4,0). 我的问题中的特征值在0附近对称分布，范围从大约-4到4。我只需要对应于负特征值的特征向量，但这将我要计算的范围变成[-4,0）。

My matrix is sparse, so it's natural to use the scipy.sparse package and its functions to calculate the eigenvectors via eigsh , since it uses much less memory to store the matrix. 我的矩阵是稀疏的，因此使用scipy.sparse包及其函数通过eigsh来计算特征向量是很eigsh ，因为它使用的内存要少得多。

Also I can tell the program to only calculate the negative eigenvalues via which='SA' . 我还可以告诉程序仅通过which='SA'计算负特征值。 The problem with this method is, that it takes now roughly 40s to compute half the eigenvalues/eigenvectors. 这种方法的问题在于，现在要花费大约40s的时间才能计算出一半的特征值/特征向量。 I know, that the ARPACK algorithm is very inefficient when computing small eigenvalues, but I can't think of any other way to compute all the eigenvectors that I need. 我知道，在计算较小的特征值时，ARPACK算法效率很低，但是我想不出任何其他方法来计算我需要的所有特征向量。

Is there any way, to speed up the calculation? 有什么办法可以加快计算速度？ Maybe with using the shift-invert mode? 也许使用移位反转模式？ I will have to do many, many diagonalizations and eventually increase the size of the matrix as well, so I am a bit lost at the moment. 我将不得不做很多对角化，并最终也增加矩阵的大小，所以我现在有点迷茫。

I would really appreciate any help! 我将非常感谢您的帮助！

1 个解决方案

This question is probably better to ask on http://scicomp.stackexchange.com as it's more of a general math question, rather than specific to Scipy or related to programming. 这个问题最好在http://scicomp.stackexchange.com上提出，因为它更多是一个通用的数学问题，而不是特定于Scipy或与编程相关的问题。

If you need all eigenvectors, it does not make very much sense to use ARPACK. 如果需要所有特征向量，则使用ARPACK并没有多大意义。 Since you need N/2 eigenvectors, your memory requirement is at least N*N/2 floats; 由于您需要N / 2个特征向量，因此您的内存需求至少为N*N/2浮点； and probably in practice more. 在实践中可能还会更多。 Using eigh requires N*N+3*N floats. 使用eigh需要N*N+3*N浮点数。 eigh is then within a factor of 2 from the minimum requirement, so the easiest solution is to stick with it. 然后eigh距离最小需求的2倍以内，因此最简单的解决方案是坚持下去。

If you can process the eigenvectors "on-line" so that you can throw the previous one away before processing the next, there are other approaches; 如果可以“在线”处理特征向量，以便在处理下一个特征向量之前可以将其丢弃，那么还有其他方法。 look at the answers to similar questions on scicomp. 查看有关scicomp的类似问题的答案。