简体   繁体   English

scipy 稀疏矩阵除法

[英]scipy sparse matrix division

I have been trying to divide a python scipy sparse matrix by a vector sum of its rows.我一直在尝试将 python scipy 稀疏矩阵除以其行的向量和。 Here is my code这是我的代码

sparse_mat = bsr_matrix((l_data, (l_row, l_col)), dtype=float)
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])

However, it throws an error no matter how I try it但是,无论我如何尝试它都会引发错误

sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 381, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 427, in __truediv__
raise NotImplementedError
NotImplementedError

Anyone with an idea of where I am going wrong?任何人都知道我哪里出错了?

You can circumvent the problem by creating a sparse diagonal matrix from the reciprocals of your row sums and then multiplying it with your matrix.您可以通过从行总和的倒数创建稀疏对角矩阵,然后将其与矩阵相乘来规避该问题。 In the product the diagonal matrix goes left and your matrix goes right.在乘积中,对角矩阵向左移动,您的矩阵向右移动。

Example:例子:

>>> a
array([[0, 9, 0, 0, 1, 0],
       [2, 0, 5, 0, 0, 9],
       [0, 2, 0, 0, 0, 0],
       [2, 0, 0, 0, 0, 0],
       [0, 9, 5, 3, 0, 7],
       [1, 0, 0, 8, 9, 0]])
>>> b = sparse.bsr_matrix(a)
>>> 
>>> c = sparse.diags(1/b.sum(axis=1).A.ravel())
>>> # on older scipy versions the offsets parameter (default 0)
... # is a required argument, thus
... # c = sparse.diags(1/b.sum(axis=1).A.ravel(), 0)
...
>>> a/a.sum(axis=1, keepdims=True)
array([[ 0.        ,  0.9       ,  0.        ,  0.        ,  0.1       ,  0.        ],
       [ 0.125     ,  0.        ,  0.3125    ,  0.        ,  0.        ,  0.5625    ],
       [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.375     ,  0.20833333,  0.125     ,  0.        ,  0.29166667],
       [ 0.05555556,  0.        ,  0.        ,  0.44444444,  0.5       ,  0.        ]])
>>> (c @ b).todense() # on Python < 3.5 replace c @ b with c.dot(b)
matrix([[ 0.        ,  0.9       ,  0.        ,  0.        ,  0.1       ,  0.        ],
        [ 0.125     ,  0.        ,  0.3125    ,  0.        ,  0.        ,  0.5625    ],
        [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
        [ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.375     ,  0.20833333,  0.125     ,  0.        ,  0.29166667],
        [ 0.05555556,  0.        ,  0.        ,  0.44444444,  0.5       ,  0.        ]])

Something funny is going on.有趣的事情正在发生。 I have no problem performing the element division.我执行元素划分没有问题。 I wonder if it's a Py2 issue.我想知道这是不是 Py2 问题。 I'm using Py3.我正在使用 Py3。

In [1022]: A=sparse.bsr_matrix([[2,4],[1,2]])
In [1023]: A
Out[1023]: 
<2x2 sparse matrix of type '<class 'numpy.int32'>'
    with 4 stored elements (blocksize = 2x2) in Block Sparse Row format>
In [1024]: A.A
Out[1024]: 
array([[2, 4],
       [1, 2]], dtype=int32)
In [1025]: A.sum(axis=1)
Out[1025]: 
matrix([[6],
        [3]], dtype=int32)
In [1026]: A/A.sum(axis=1)
Out[1026]: 
matrix([[ 0.33333333,  0.66666667],
        [ 0.33333333,  0.66666667]])

or to try the other example:或尝试另一个示例:

In [1027]: b=sparse.bsr_matrix([[0, 9, 0, 0, 1, 0],
      ...:        [2, 0, 5, 0, 0, 9],
      ...:        [0, 2, 0, 0, 0, 0],
      ...:        [2, 0, 0, 0, 0, 0],
      ...:        [0, 9, 5, 3, 0, 7],
      ...:        [1, 0, 0, 8, 9, 0]])
In [1028]: b
Out[1028]: 
<6x6 sparse matrix of type '<class 'numpy.int32'>'
    with 14 stored elements (blocksize = 1x1) in Block Sparse Row format>
In [1029]: b.sum(axis=1)
Out[1029]: 
matrix([[10],
        [16],
        [ 2],
        [ 2],
        [24],
        [18]], dtype=int32)
In [1030]: b/b.sum(axis=1)
Out[1030]: 
matrix([[ 0.        ,  0.9       ,  0.        ,  0.        ,  0.1       , 0.        ],
        [ 0.125     ,  0.        ,  0.3125    ,  0.        ,  0.        , 0.5625    ],
 ....
        [ 0.05555556,  0.        ,  0.        ,  0.44444444,  0.5       ,     0.        ]])

The result of this sparse/dense is also dense, where as the c*b ( c is the sparse diagonal) is sparse.这种稀疏/密集的结果也是密集的,其中c*bc是稀疏对角线)是稀疏的。

In [1039]: c*b
Out[1039]: 
<6x6 sparse matrix of type '<class 'numpy.float64'>'
    with 14 stored elements in Compressed Sparse Row format>

The sparse sum is a dense matrix.稀疏和是一个密集矩阵。 It is 2d, so there's no need to expand it dimensions.它是二维的,所以不需要扩展它的尺寸。 In fact if I try that I get an error:事实上,如果我尝试,我会得到一个错误:

In [1031]: A/(A.sum(axis=1)[:,None])
....
ValueError: shape too large to be a matrix.

Per this message , to keep the matrix sparse, you access the data values and use the (nonzero) indices:根据此消息,为了保持矩阵稀疏,您可以访问数据值并使用(非零)索引:

sums = np.asarray(A.sum(axis=1)).squeeze()  # this is dense
A.data /= sums[A.nonzero()[0]]

If dividing by the nonzero row mean instead of the sum, one can如果除以非零行平均值而不是总和,则可以

nnz = A.getnnz(axis=1)  # this is also dense
means = sums / nnz
A.data /= means[A.nonzero()[0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM