简体   繁体   中英

Division of sparse matrix

I have a scipy.sparse matrix with 45671x45671 elements. In this matrix, some rows contain only '0' value.

My question is, how to divide each row values by the row sum. Obviously, with for loop it's work, but I look for an efficient method...

I already tried :

  • matrix / matrix.sum(1) but I have MemoryError issue.
  • matrix / scs.csc_matrix((matrix.sum(axis=1))) but ValueError: inconsistent shapes
  • Other wacky things...

Moreover, I want to skip rows with only '0' values.

So, if you have any solution...

Thank you in advance !

I have an M hanging around:

In [241]: M
Out[241]: 
<6x3 sparse matrix of type '<class 'numpy.uint8'>'
    with 6 stored elements in Compressed Sparse Row format>
In [242]: M.A
Out[242]: 
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]], dtype=uint8)
In [243]: M.sum(1)            # dense matrix
Out[243]: 
matrix([[1],
        [1],
        [1],
        [1],
        [1],
        [1]], dtype=uint32)
In [244]: M/M.sum(1)      # dense matrix - full size of M
Out[244]: 
matrix([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 1.,  0.,  0.]])

That will explain the memory error - if M is so large that MA produces a memory error.


In [262]: S = sparse.csr_matrix(M.sum(1))
In [263]: S.shape
Out[263]: (6, 1)
In [264]: M.shape
Out[264]: (6, 3)
In [265]: M/S
....
ValueError: inconsistent shapes

I'm not entirely sure what is going on here.

Element wise multiplication works

In [266]: M.multiply(S)
Out[266]: 
<6x3 sparse matrix of type '<class 'numpy.uint32'>'
    with 6 stored elements in Compressed Sparse Row format>

So it should work if I construct S as S = sparse.csr_matrix(1/M.sum(1))

If some of the rows sum to zero, you have a division by zero problem.


If I modify M to have 0 row

In [283]: M.A
Out[283]: 
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]], dtype=uint8)
In [284]: S = sparse.csr_matrix(1/M.sum(1))
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
  #!/usr/bin/python3
In [285]: S.A
Out[285]: 
array([[  1.],
       [  1.],
       [ inf],
       [  1.],
       [  1.],
       [  1.]])
In [286]: M.multiply(S)
Out[286]: 
<6x3 sparse matrix of type '<class 'numpy.float64'>'
    with 5 stored elements in Compressed Sparse Row format>
In [287]: _.A
Out[287]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.]])

This isn't the best M to demonstrate this on, but it suggests a useful approach. The row sum will be dense, so you can clean up its inverse using the usual dense array approaches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM