Division of sparse matrix

Question

I have a scipy.sparse matrix with 45671x45671 elements. In this matrix, some rows contain only '0' value.

My question is, how to divide each row values by the row sum. Obviously, with for loop it's work, but I look for an efficient method...

I already tried :

matrix / matrix.sum(1) but I have MemoryError issue.
matrix / scs.csc_matrix((matrix.sum(axis=1))) but ValueError: inconsistent shapes
Other wacky things...

Moreover, I want to skip rows with only '0' values.

So, if you have any solution...

Thank you in advance !

Answer 1

I have an M hanging around:

In [241]: M
Out[241]: 
<6x3 sparse matrix of type '<class 'numpy.uint8'>'
    with 6 stored elements in Compressed Sparse Row format>
In [242]: M.A
Out[242]: 
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]], dtype=uint8)
In [243]: M.sum(1)            # dense matrix
Out[243]: 
matrix([[1],
        [1],
        [1],
        [1],
        [1],
        [1]], dtype=uint32)
In [244]: M/M.sum(1)      # dense matrix - full size of M
Out[244]: 
matrix([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 1.,  0.,  0.]])

That will explain the memory error - if M is so large that MA produces a memory error.

In [262]: S = sparse.csr_matrix(M.sum(1))
In [263]: S.shape
Out[263]: (6, 1)
In [264]: M.shape
Out[264]: (6, 3)
In [265]: M/S
....
ValueError: inconsistent shapes

I'm not entirely sure what is going on here.

Element wise multiplication works

In [266]: M.multiply(S)
Out[266]: 
<6x3 sparse matrix of type '<class 'numpy.uint32'>'
    with 6 stored elements in Compressed Sparse Row format>

So it should work if I construct S as S = sparse.csr_matrix(1/M.sum(1))

If some of the rows sum to zero, you have a division by zero problem.

If I modify M to have 0 row

In [283]: M.A
Out[283]: 
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]], dtype=uint8)
In [284]: S = sparse.csr_matrix(1/M.sum(1))
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
  #!/usr/bin/python3
In [285]: S.A
Out[285]: 
array([[  1.],
       [  1.],
       [ inf],
       [  1.],
       [  1.],
       [  1.]])
In [286]: M.multiply(S)
Out[286]: 
<6x3 sparse matrix of type '<class 'numpy.float64'>'
    with 5 stored elements in Compressed Sparse Row format>
In [287]: _.A
Out[287]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.]])

This isn't the best M to demonstrate this on, but it suggests a useful approach. The row sum will be dense, so you can clean up its inverse using the usual dense array approaches.

Division of sparse matrix

Question

1 answers

solution1
2 ACCPTED 2017-05-19 23:40:28

Division of sparse matrix

Question

1 answers

solution1 2 ACCPTED 2017-05-19 23:40:28

solution1
2 ACCPTED 2017-05-19 23:40:28