简体   繁体   中英

inserting rows and columns of zeros to a sparse array in python

I have 50ish relatively large sparse arrays (in scipy.csr_array format but that can be changed) and I would like to insert rows and columns of zeros at certain locations. An example in dense format would look like:

A = np.asarray([[1,2,1],[2,4,5],[2,1,6]])
# A = array([[1,2,1],
#            [2,4,5],
#            [2,1,6]])
indices = np.asarray([-1, -1, 2, -1, 4, -1, -1, 7, -1])

# indices =  array([-1, -1, 2, -1, 4, -1, -1, 7, -1])
#insert rows and colums of zeros where indices[i] == -1 to get B

B = np.asarray([[0,0,0,0,0,0,0,0,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,1,0,2,0,0,1,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,2,0,4,0,0,5,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,2,0,1,0,0,6,0],
                [0,0,0,0,0,0,0,0,0]])

A is a sparse array of shape (~2000, ~2000) with ~20000 non zero entries and indices is of shape (4096, ). I can imagine doing it in dense format but I guess I don't know enough about the way data and indices are are stored and cannot find a way to do this sort of operation for sparse arrays in a quick and efficient way.

Anyone have any ideas or suggestions?

Thanks.

You could try storing your non-zero values in one list and their respective indexes in another:

data_list = [[], [], [1, 2, 1], [], [2, 4, 5], [], [], [2, 1, 6], []]
index_list = [[], [], [2, 4, 7], [], [2, 4, 7], [], [], [2, 4, 7], []]

These two lists, would only then have to store the number of nonzero values each, rather than one list with 4,000,000 values.

If you then wanted to grab the value in position (4, 7):

def find_value(row, col):
    # Check to see if the given column is in our index list
    if col not in index_list[row]:
        return 0
    
    # Otherwise return the number in the data list
    myNum = data_list[row][index_list[row].index(col)]
    return myNum
    
find_value(4, 7)
output: 5

Hope this helps!

I would probably do this by passing the data and associated indices into a COO matrix constructor:

import numpy as np
from scipy.sparse import coo_matrix

A = np.asarray([[1,2,1],[2,4,5],[2,1,6]])
indices = np.asarray([-1, -1, 2, -1, 4, -1, -1, 7, -1])

idx = indices[indices >= 0]
col, row = np.meshgrid(idx, idx)

mat = coo_matrix((A.ravel(), (row.ravel(), col.ravel())),
                 shape=(len(indices), len(indices)))
print(mat)
#   (2, 2)  1
#   (2, 4)  2
#   (2, 7)  1
#   (4, 2)  2
#   (4, 4)  4
#   (4, 7)  5
#   (7, 2)  2
#   (7, 4)  1
#   (7, 7)  6

print(mat.todense())
# [[0 0 0 0 0 0 0 0 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 1 0 2 0 0 1 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 2 0 4 0 0 5 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 2 0 1 0 0 6 0]
#  [0 0 0 0 0 0 0 0 0]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM