简体   繁体   中英

Indexing and replacing values in sparse CSC matrix (Python)

I have a sparse CSC matrix, "A", in which I want to replace the first row with a vector that is all zeros, except for the first entry which is 1.

So far I am doing the inefficient version, eg:

import numpy as np
from scipy.sparse import csc_matrix

row = np.array([0, 2, 2, 0, 1, 2])
col = np.array([0, 0, 1, 2, 2, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = csc_matrix((data, (row, col)), shape=(3, 3))
replace = np.zeros(3)
replace[0] = 1 
A[0,:] = replace
A.eliminate_zeros()

But I'd like to do it with .indptr, .data, etc. As it is a CSC, I am guessing that this might be inefficient as well? In my exact problem, the matrix is 66000 X 66000.

For a CSR sparse matrix I've seen it done as

A.data[1:A.indptr[1]] = 0
A.data[0] = 1.0
A.indices[0] = 0
A.eliminate_zeros()

So, basically I'd like to do the same for a CSC sparse matrix.

Expected result: To do exactly the same as above, just more efficiently (applicable to very large sparse matrices).

That is, start with:

[1, 0, 4],
[0, 0, 5],
[2, 3, 6]

and replace the upper row with a vector that is as long as the matrix, is all zeros except for 1 at the beginning. As such, one should end with

[1, 0, 0],
[0, 0, 5],
[2, 3, 6]

And be able to do it for large sparse CSC matrices efficiently.

Thanks in advance :-)

You can do it by indptr and indices . If you want to construct your matrix with indptr and indices parameters by:

indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = csc_matrix((data, indices, indptr), shape=(3,3))

But if you want to set all elements in the first row except the first element in row 0, you need to set data values to zero for those that indices is zero. In other words:

data[indices == 0] = 0

The above line set all the elements of the first row to 0. To avoid setting the first element to zero we can do the following:

indices_tmp = indices == 0
indices_tmp[0] = False    # to avoid removing the first element in row 0.
data[indices_tmp == True] = 0
A = csc_matrix((data, indices, indptr), shape=(3,3))

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM