简体   繁体   中英

scipy sparse matrix division

I have been trying to divide a python scipy sparse matrix by a vector sum of its rows. Here is my code

sparse_mat = bsr_matrix((l_data, (l_row, l_col)), dtype=float)
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])

However, it throws an error no matter how I try it

sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 381, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 427, in __truediv__
raise NotImplementedError
NotImplementedError

Anyone with an idea of where I am going wrong?

You can circumvent the problem by creating a sparse diagonal matrix from the reciprocals of your row sums and then multiplying it with your matrix. In the product the diagonal matrix goes left and your matrix goes right.

Example:

>>> a
array([[0, 9, 0, 0, 1, 0],
       [2, 0, 5, 0, 0, 9],
       [0, 2, 0, 0, 0, 0],
       [2, 0, 0, 0, 0, 0],
       [0, 9, 5, 3, 0, 7],
       [1, 0, 0, 8, 9, 0]])
>>> b = sparse.bsr_matrix(a)
>>> 
>>> c = sparse.diags(1/b.sum(axis=1).A.ravel())
>>> # on older scipy versions the offsets parameter (default 0)
... # is a required argument, thus
... # c = sparse.diags(1/b.sum(axis=1).A.ravel(), 0)
...
>>> a/a.sum(axis=1, keepdims=True)
array([[ 0.        ,  0.9       ,  0.        ,  0.        ,  0.1       ,  0.        ],
       [ 0.125     ,  0.        ,  0.3125    ,  0.        ,  0.        ,  0.5625    ],
       [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.375     ,  0.20833333,  0.125     ,  0.        ,  0.29166667],
       [ 0.05555556,  0.        ,  0.        ,  0.44444444,  0.5       ,  0.        ]])
>>> (c @ b).todense() # on Python < 3.5 replace c @ b with c.dot(b)
matrix([[ 0.        ,  0.9       ,  0.        ,  0.        ,  0.1       ,  0.        ],
        [ 0.125     ,  0.        ,  0.3125    ,  0.        ,  0.        ,  0.5625    ],
        [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
        [ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.375     ,  0.20833333,  0.125     ,  0.        ,  0.29166667],
        [ 0.05555556,  0.        ,  0.        ,  0.44444444,  0.5       ,  0.        ]])

Something funny is going on. I have no problem performing the element division. I wonder if it's a Py2 issue. I'm using Py3.

In [1022]: A=sparse.bsr_matrix([[2,4],[1,2]])
In [1023]: A
Out[1023]: 
<2x2 sparse matrix of type '<class 'numpy.int32'>'
    with 4 stored elements (blocksize = 2x2) in Block Sparse Row format>
In [1024]: A.A
Out[1024]: 
array([[2, 4],
       [1, 2]], dtype=int32)
In [1025]: A.sum(axis=1)
Out[1025]: 
matrix([[6],
        [3]], dtype=int32)
In [1026]: A/A.sum(axis=1)
Out[1026]: 
matrix([[ 0.33333333,  0.66666667],
        [ 0.33333333,  0.66666667]])

or to try the other example:

In [1027]: b=sparse.bsr_matrix([[0, 9, 0, 0, 1, 0],
      ...:        [2, 0, 5, 0, 0, 9],
      ...:        [0, 2, 0, 0, 0, 0],
      ...:        [2, 0, 0, 0, 0, 0],
      ...:        [0, 9, 5, 3, 0, 7],
      ...:        [1, 0, 0, 8, 9, 0]])
In [1028]: b
Out[1028]: 
<6x6 sparse matrix of type '<class 'numpy.int32'>'
    with 14 stored elements (blocksize = 1x1) in Block Sparse Row format>
In [1029]: b.sum(axis=1)
Out[1029]: 
matrix([[10],
        [16],
        [ 2],
        [ 2],
        [24],
        [18]], dtype=int32)
In [1030]: b/b.sum(axis=1)
Out[1030]: 
matrix([[ 0.        ,  0.9       ,  0.        ,  0.        ,  0.1       , 0.        ],
        [ 0.125     ,  0.        ,  0.3125    ,  0.        ,  0.        , 0.5625    ],
 ....
        [ 0.05555556,  0.        ,  0.        ,  0.44444444,  0.5       ,     0.        ]])

The result of this sparse/dense is also dense, where as the c*b ( c is the sparse diagonal) is sparse.

In [1039]: c*b
Out[1039]: 
<6x6 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>

The sparse sum is a dense matrix. It is 2d, so there's no need to expand it dimensions. In fact if I try that I get an error:

In [1031]: A/(A.sum(axis=1)[:,None])
....
ValueError: shape too large to be a matrix.

Per this message , to keep the matrix sparse, you access the data values and use the (nonzero) indices:

sums = np.asarray(A.sum(axis=1)).squeeze()  # this is dense
A.data /= sums[A.nonzero()[0]]

If dividing by the nonzero row mean instead of the sum, one can

nnz = A.getnnz(axis=1)  # this is also dense
means = sums / nnz
A.data /= means[A.nonzero()[0]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM