简体   繁体   English

如何使用排列数组有效地排列稀疏(Numpy)矩阵中的行?

[英]How to permutate rows in sparse (Numpy) matrix efficiently using permutation array?

I used the Scipy Reverse Cuthill-McKee implementation ( scipy.sparse.csgraph.reverse_cuthill_mckee ) for creating a band matrix using a (high-dimensional) sparse csr_matrix. 我使用Scipy Reverse Cuthill-McKee实现( scipy.sparse.csgraph.reverse_cuthill_mckee )使用(高维)稀疏csr_matrix创建带矩阵。 The result of this method is a permutation array whichs gives me the indices of how to permutate the rows of my matrix as I understood. 这种方法的结果是一个排列数组,据我所知,它为我提供了如何排列矩阵行的索引。

Now is there any efficient solution for doing this permutation on my sparse csr_matrix in any other sparse matrix (csr, lil_matrix, etc)? 现在,是否有任何有效的解决方案可对其他稀疏矩阵(csr,lil_matrix等)中的稀疏csr_matrix进行此排列? I tried a for-loop but my matrix has dimension like 200,000 x 150,000 and it takes too much time. 我尝试了一个for循环,但矩阵的尺寸约为200,000 x 150,000,这需要太多时间。

A = csr_matrix((data,(rowind,columnind)), shape=(200000, 150000), dtype=np.uint8)

permutation_array = csgraph.reverse_cuthill_mckee(A, false)

result_matrix = lil_matrix((200000, 150000), dtype=np.uint8)

i=0
for x in np.nditer(permutation_array):
    result_matrix[x, :]=A[i, :]
    i+=1

The result of the reverse_cuthill_mckee call is an array which is like a tupel containing the indices for my permutation. reverse_cuthill_mckee调用的结果是一个数组,就像一个包含我排列的索引的tupel一样。 So this array is something like: [199999 54877 54873 ..., 12045 9191 0] (size = 200,000) 因此,此数组类似于:[199999 54877 54873 ...,12045 9191 0](大小= 200,000)

This means: row with index 0 has now index 199999, row with index 1 has now index 54877, row with index 2 has now index 54873, etc. see: https://en.wikipedia.org/wiki/Permutation#Definition_and_notations (As I understood the return) 这意味着:索引为0的行现在具有索引199999,索引为1的行现在具有索引54877,索引2的行现在具有索引54873,以此类推。请参见: https ://en.wikipedia.org/wiki/Permutation#Definition_and_notations(据我了解的回报)

Thank you 谢谢

I wonder if you are applying the permutation array correctly. 我想知道您是否正确应用了置换数组。

Make a random matrix (float) and convert it to a uint8 (beware, csr calculations might not work with this dtype): 制作一个随机矩阵(浮点数)并将其转换为uint8 (请注意, csr计算可能不适用于此dtype):

In [963]: ran=sparse.random(10,10,.3, format='csr')
In [964]: A = sparse.csr_matrix((np.ones(ran.data.shape).astype(np.uint8),ran.indices, ran.indptr))
In [965]: A.A
Out[965]: 
array([[1, 1, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
       [0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 1, 1, 0, 0, 0]], dtype=uint8)

(oops, used the wrong matrix here): (糟糕,此处使用了错误的矩阵):

In [994]: permutation_array = csgraph.reverse_cuthill_mckee(A, False)
In [995]: permutation_array
Out[995]: array([9, 7, 0, 4, 6, 3, 5, 1, 8, 2], dtype=int32)

My first inclination is to use such an array to simply index rows of the original matrix: 我的第一个倾向是使用这样的数组来简单地索引原始矩阵的行:

In [996]: A[permutation_array,:].A
Out[996]: 
array([[0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 1, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 0, 1, 0],
       [0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

I see some clustering; 我看到一些聚类; maybe the best we can expect from a random matrix. 也许我们可以从随机矩阵中得到最好的结果。

You on the other hand appear to be doing: 另一方面,您似乎在做:

In [997]: res = sparse.lil_matrix(A.shape,dtype=A.dtype)
In [998]: res[permutation_array,:] = A
In [999]: res.A
Out[999]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
       [0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
       [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 0, 1, 0],
       [0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 1, 0, 0, 0]], dtype=uint8)

I don't see any improvement in clustering of 1s in res . 我看不到res的1s聚类有任何改善。


The docs for the MATLAB equivalent say 相当于MATLAB的文档说

r = symrcm(S) returns the symmetric reverse Cuthill-McKee ordering of S. This is a permutation r such that S(r,r) tends to have its nonzero elements closer to the diagonal. r = symrcm(S)返回S的对称反向Cuthill-McKee排序。这是一个置换r,因此S(r,r)倾向于使其非零元素更接近对角线。

In numpy terms, that means: numpy条款,这意味着:

In [1019]: I,J=np.ix_(permutation_array,permutation_array)
In [1020]: A[I,J].A
Out[1020]: 
array([[0, 0, 0, 1, 1, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 0, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 0, 0, 0, 0, 1, 0, 0],
       [0, 1, 1, 0, 0, 0, 1, 0, 0, 1],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

And indeed there are more 0 bands in the 2 off diagonal corners. 实际上,在2个对角线的拐角处还有更多的0带。

And using the bandwidth calculation on the MATLAB page, https://www.mathworks.com/help/matlab/ref/symrcm.html 并使用MATLAB页面https://www.mathworks.com/help/matlab/ref/symrcm.html上的带宽计算

In [1028]: i,j=A.nonzero()
In [1029]: np.max(i-j)
Out[1029]: 7
In [1030]: i,j=A[I,J].nonzero()
In [1031]: np.max(i-j)
Out[1031]: 5

The MATLAB docs say that with this permutation, the eigenvalues remain the same. MATLAB文档说,通过这种排列,特征值保持不变。 Testing: 测试:

In [1032]: from scipy.sparse import linalg
In [1048]: linalg.eigs(A.astype('f'))[0]
Out[1048]: 
array([ 3.14518213+0.j        , -0.96188843+0.j        ,
       -0.58978939+0.62853903j, -0.58978939-0.62853903j,
        1.09950364+0.54544497j,  1.09950364-0.54544497j], dtype=complex64)
In [1049]: linalg.eigs(A[I,J].astype('f'))[0]
Out[1049]: 
array([ 3.14518023+0.j        ,  1.09950352+0.54544479j,
        1.09950352-0.54544479j, -0.58978981+0.62853914j,
       -0.58978981-0.62853914j, -0.96188819+0.j        ], dtype=complex64)

Eigenvalues are not the same for the row permutations we tried earlier: 我们先前尝试的行排列的特征值不同:

In [1050]: linalg.eigs(A[permutation_array,:].astype('f'))[0]
Out[1050]: 
array([ 2.95226836+0.j        , -1.60117996+0.52467293j,
       -1.60117996-0.52467293j, -0.01723826+1.06249797j,
       -0.01723826-1.06249797j,  0.90314150+0.j        ], dtype=complex64)
In [1051]: linalg.eigs(res.astype('f'))[0]
Out[1051]: 
array([-0.05822830-0.97881651j, -0.99999994+0.j        ,
        1.17350495+0.j        , -0.91237622+0.8656373j ,
       -0.91237622-0.8656373j ,  2.26292515+0.j        ], dtype=complex64)

This [I,J] permutation works with the example matrix in http://ciprian-zavoianu.blogspot.com/2009/01/project-bandwidth-reduction.html [I,J]置换适用于http://ciprian-zavoianu.blogspot.com/2009/01/project-bandwidth-reduction.html中的示例矩阵

In [1058]: B = np.matrix('1 0 0 0 1 0 0 0;0 1 1 0 0 1 0 1;0 1 1 0 1 0 0 0;0 0 0 
      ...: 1 0 0 1 0;1 0 1 0 1 0 0 0; 0 1 0 0 0 1 0 1;0 0 0 1 0 0 1 0;0 1 0 0 0 
      ...: 1 0 1')
In [1059]: B
Out[1059]: 
matrix([[1, 0, 0, 0, 1, 0, 0, 0],
        [0, 1, 1, 0, 0, 1, 0, 1],
        [0, 1, 1, 0, 1, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 1, 0],
        [1, 0, 1, 0, 1, 0, 0, 0],
        [0, 1, 0, 0, 0, 1, 0, 1],
        [0, 0, 0, 1, 0, 0, 1, 0],
        [0, 1, 0, 0, 0, 1, 0, 1]])
In [1060]: Bm=sparse.csr_matrix(B)
In [1061]: Bm
Out[1061]: 
<8x8 sparse matrix of type '<class 'numpy.int32'>'
    with 22 stored elements in Compressed Sparse Row format>
In [1062]: permB = csgraph.reverse_cuthill_mckee(Bm, False)
In [1063]: permB
Out[1063]: array([6, 3, 7, 5, 1, 2, 4, 0], dtype=int32)
In [1064]: Bm[np.ix_(permB,permB)].A
Out[1064]: 
array([[1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1]], dtype=int32)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM