简体   繁体   中英

Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

Lets say , S is the large scipy-csr-matrix(sparse) and a dictionary D with key -> index(position) of the row vector A in S & values -> list of all the indices(positions) of other row vectors l in S. For each row vector in l you subtract A and get the new vector which will be nothing but the new row vector to be updated in the new sparse matrix.

dictionary of form -> { 1 : [4 , 5 ,... ,63] } then have to create a new sparse matrix with....

new_row_vector_1 -> S_vec 1 - S_vec 4

new_row_vector_2 -> S_vec 1 - S_vec 5

.

new_row_vector_n -> S_vec 1 - S_vec 63

where S_vecX is the Xth row vector of matrix S

Check out the pictorial explanation of the above statements

Numpy Example:

>>> import numpy as np
>>> s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
>>> s
array([[ 1,  5,  3,  4],
       [ 3,  0, 12,  7],
       [ 5,  6,  2,  4],
       [ 4,  6,  6,  4],
       [ 7, 12,  5, 67]])
>>> index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]} 
>>> n = np.zeros((10,4)) #sum of all lengths of values in index_dictionary would be the number of rows for the new array(n) and columns remain the same as s.
>>> n
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> idx = 0
>>> for index in index_dictionary:
...     for k in index_dictionary[index]:
...             n[idx] = s[index]-s[k]
...             idx += 1
... 
>>> n
array([[ -4.,  -1.,   1.,   0.],
       [ -6.,  -7.,  -2., -63.],
       [ -1.,  -6.,   6.,   3.],
       [ -4., -12.,   7., -60.],
       [  2.,   6., -10.,  -3.],
       [  1.,   6.,  -6.,  -3.],
       [ -1.,   0.,   4.,   0.],
       [  4.,  12.,  -7.,  60.],
       [  3.,   6.,  -1.,  63.],
       [  2.,   6.,   3.,  63.]])

n is what i want.

Here's a simple demonstration of what I think you are trying to do:

First the numpy array version:

In [619]: arr=np.arange(12).reshape(4,3)
In [620]: arr[[1,0,2,3]]-arr[0] 
Out[620]: 
array([[3, 3, 3],
       [0, 0, 0],
       [6, 6, 6],
       [9, 9, 9]])

Now the sparse equivalent:

In [622]: M=sparse.csr_matrix(arr)

csr implements row indexing:

In [624]: M[[1,0,2,3]]
Out[624]: 
<4x3 sparse matrix of type '<class 'numpy.int32'>'
    with 11 stored elements in Compressed Sparse Row format>
In [625]: M[[1,0,2,3]].A
Out[625]: 
array([[ 3,  4,  5],
       [ 0,  1,  2],
       [ 6,  7,  8],
       [ 9, 10, 11]], dtype=int32)

But not broadcasting:

In [626]: M[[1,0,2,3]]-M[0]
 ....
ValueError: inconsistent shapes

So we can use an explicit form of broadcasting

In [627]: M[[1,0,2,3]]-M[[0,0,0,0]]  # or M[[0]*4]
Out[627]: 
<4x3 sparse matrix of type '<class 'numpy.int32'>'
    with 9 stored elements in Compressed Sparse Row format>
In [628]: _.A
Out[628]: 
array([[3, 3, 3],
       [0, 0, 0],
       [6, 6, 6],
       [9, 9, 9]], dtype=int32)

This may not be the fastest or most efficient, but it's a start.

I found in a previous SO question that M[[1,0,2,3]] indexing is performed with a matrix multiplication, in this case the equivalent of:

idxM = sparse.csr_matrix(([1,1,1,1],([0,1,2,3],[1,0,2,3])),(4,4))
M1 = idxM * M

Sparse matrix slicing using list of int

So my difference expression requires 2 such multiplication along with the subtraction.

We could try a row by row iteration, and building a new matrix from the result, but there's no guarantee that it will be faster. Depending on the arrays, converting to dense and back might even be faster.

=================

I can imagine 2 ways of applying this to the dictionary.

One is to iterate through the dictionary (what order?), perform this difference for each key, collect the results in a list (list of sparse matrices), and use sparse.bmat to join them into one matrix.

Another is to collect two lists of indexes, and apply the above indexed difference just once.

In [8]: index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]} 
In [10]: alist=[]
    ...: for index in index_dictionary:
    ...:     for k in index_dictionary[index]:
    ...:         alist.append((index, k))       
In [11]: idx = np.array(alist)
In [12]: idx
Out[12]: 
array([[0, 2],
       [0, 4],
       [1, 3],
       [1, 4],
       [2, 1],
       [3, 1],
       [3, 2],
       [4, 1],
       [4, 3],
       [4, 2]])

applied to your dense s :

In [15]: s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
In [16]: s[idx[:,0]]-s[idx[:,1]]
Out[16]: 
array([[ -4,  -1,   1,   0],
       [ -6,  -7,  -2, -63],
       [ -1,  -6,   6,   3],
       [ -4, -12,   7, -60],
       [  2,   6, -10,  -3],
       [  1,   6,  -6,  -3],
       [ -1,   0,   4,   0],
       [  4,  12,  -7,  60],
       [  3,   6,  -1,  63],
       [  2,   6,   3,  63]])

and to the sparse equivalent

In [19]: arr= sparse.csr_matrix(s)
In [20]: arr
Out[20]: 
<5x4 sparse matrix of type '<class 'numpy.int32'>'
    with 19 stored elements in Compressed Sparse Row format>
In [21]: res=arr[idx[:,0]]-arr[idx[:,1]]
In [22]: res
Out[22]: 
<10x4 sparse matrix of type '<class 'numpy.int32'>'
    with 37 stored elements in Compressed Sparse Row format>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM