简体   繁体   中英

How do you edit cells in a sparse matrix using scipy?

I'm trying to manipulate some data in a sparse matrix. Once I've created one, how do I add / alter / update values in it? This seems very basic, but I can't find it in the documentation for the sparse matrix classes, or on the web. I think I'm missing something crucial.

This is my failed attempt to do so the same way I would a normal array.

>>> from scipy.sparse import bsr_matrix
>>> A = bsr_matrix((10,10))
>>> A[5][7] = 6

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    A[5][7] = 6
  File "C:\Python27\lib\site-packages\scipy\sparse\bsr.py", line 296, in __getitem__
    raise NotImplementedError
NotImplementedError

There several Sparse matrix formats. Some are better suited to indexing. One that has implemented it is lil_matrix .

Al = A.tolil()
Al[5,7] = 6  # the normal 2d matrix indexing notation
print Al
print Al.A # aka Al.todense()
A1 = Al.tobsr()  # if it must be in bsr format

The documentation for each format suggests what it is good at, and where it is bad. But it does not have a neat list of which ones have which operations defined.

Advantages of the LIL format
  supports flexible slicing
  changes to the matrix sparsity structure are efficient
  ...
Intended Usage
  LIL is a convenient format for constructing sparse matrices
  ...

dok_matrix also implements indexing.

The underlying data structure for coo_matrix is easy to understand. It is essentially the parameters for coo_matrix((data, (i, j)), [shape=(M, N)]) definition. To create the same matrix you could use:

sparse.coo_matrix(([6],([5],[7])), shape=(10,10))

If you have more assignments, build larger data , i , j lists (or 1d arrays), and when complete construct the sparse matrix.

The documentation for bsr is here bsr matrix and for csr is here csr matrix . It might be worth it to understand the csr before moving to the bsr. The only difference is that bsr has entries that are matrices themselves whereas the basic unit in a csr is a scalar.

I don't know if there are super easy ways to manipulate the matrices once they are created, but here are some examples of what you're trying to do,

import numpy as np
from scipy.sparse import bsr_matrix, csr_matrix

row = np.array( [5] )
col = np.array( [7] )
data = np.array( [6] )
A = csr_matrix( (data,(row,col)) )

This is a straightforward syntax in which you list all the data you want in the matrix in the array data and then specify where that data should go using row and col . Note that this will make the matrix dimensions just big enough to hold the element in the largest row and column ( in this case a 6x8 matrix ). You can see the matrix in standard form using the todense() method.

A.todense()

However, you cannot manipulate the matrix on the fly using this pattern. What you can do is modify the native scipy representation of the matrix. This involves 3 attributes, indices , indptr , and data . To start with, we can examine the value of these attributes for the array we've already created.

>>> print A.data
array([6])

>>> print A.indices
array([7], dtype=int32)

>>> print A.indptr
array([0, 0, 0, 0, 0, 0, 1], dtype=int32)

data is the same thing it was before, a 1-d array of values we want in the matrix. The difference is that the position of this data is now specified by indices and indptr instead of row and col . indices is fairly straightforward. It simply a list of which column each data entry is in. It will always be the same size and the data array. indptr is a little trickier. It lets the data structure know what row each data entry is in. To quote from the docs,

the column indices for row i are stored in indices[indptr[i]:indptr[i+1]]

From this definition we can see that the size of indptr will always be the number of rows in the matrix + 1. It takes a little while to get used to it, but working through the values for each row will give you some intuition. Note that all the entries are zero until the last one. That means that the column indices for rows i=0-4 are going to be stored in indices[0:0] ie the empty array. This is because these rows are all zeros. Finally, on the last row, i=5 we get indices[0:1]=7 which tells us the data entry(ies) data[0:1] are in row 5, column 7.

Now suppose we wanted to add the value 10 at row 2 column 4. We first put it into the data attribute,

A.data = np.array( [10,6] )   

next we update indices to indicate the column 10 will be in,

A.indices = np.array( [4,7], dtype=np.int32 )

and finally we indicate which row it will be in by modifying indptr

A.indptr = np.array( [0,0,0,1,1,1,2], dtype=np.int32 )

It is important that you make the data type of indices and indptr np.int32 . One way to visualize what's going in in indptr is that the change in numbers occurs as you move from i to i+1 of a row that has data. Also note that arrays like these can be used to construct sparse matrices

B = csr_matrix( (data,indices,indptr) )

It would be nice if it was as easy as simply indexing into the array as you tried, but the implementation is not there yet. That should be enough to get you started at least.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM