简体   繁体   中英

How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp

>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -1.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],
       [ 0.,  0., -3.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.

HTH

Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:

import scipy.sparse as ssp
def pointmult(a,b):
    x = a.copy()
    for i in xrange(a.shape[0]):
        if x.data[i]:
            for j in xrange(len(x.data[i])):
                x.data[i] *= b[x.rows[i]]
    return x

It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM