简体   繁体   English

将零的行和列插入 python 中的稀疏数组

[英]inserting rows and columns of zeros to a sparse array in python

I have 50ish relatively large sparse arrays (in scipy.csr_array format but that can be changed) and I would like to insert rows and columns of zeros at certain locations.我有 50ish 相对较大的稀疏 arrays(采用scipy.csr_array格式,但可以更改),我想在某些位置插入零行和零列。 An example in dense format would look like:密集格式的示例如下所示:

A = np.asarray([[1,2,1],[2,4,5],[2,1,6]])
# A = array([[1,2,1],
#            [2,4,5],
#            [2,1,6]])
indices = np.asarray([-1, -1, 2, -1, 4, -1, -1, 7, -1])

# indices =  array([-1, -1, 2, -1, 4, -1, -1, 7, -1])
#insert rows and colums of zeros where indices[i] == -1 to get B

B = np.asarray([[0,0,0,0,0,0,0,0,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,1,0,2,0,0,1,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,2,0,4,0,0,5,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,0,0,0,0,0,0,0],
                [0,0,2,0,1,0,0,6,0],
                [0,0,0,0,0,0,0,0,0]])

A is a sparse array of shape (~2000, ~2000) with ~20000 non zero entries and indices is of shape (4096, ). A是形状为 (~2000, ~2000) 的稀疏数组,具有~20000 个非零条目, indices的形状为 (4096, )。 I can imagine doing it in dense format but I guess I don't know enough about the way data and indices are are stored and cannot find a way to do this sort of operation for sparse arrays in a quick and efficient way.我可以想象以密集格式进行操作,但我想我对数据和索引的存储方式知之甚少,无法找到快速有效地对稀疏 arrays 执行此类操作的方法。

Anyone have any ideas or suggestions?有人有什么想法或建议吗?

Thanks.谢谢。

You could try storing your non-zero values in one list and their respective indexes in another:您可以尝试将非零值存储在一个列表中,并将它们各自的索引存储在另一个列表中:

data_list = [[], [], [1, 2, 1], [], [2, 4, 5], [], [], [2, 1, 6], []]
index_list = [[], [], [2, 4, 7], [], [2, 4, 7], [], [], [2, 4, 7], []]

These two lists, would only then have to store the number of nonzero values each, rather than one list with 4,000,000 values.这两个列表只需要存储每个非零值的数量,而不是一个包含 4,000,000 个值的列表。

If you then wanted to grab the value in position (4, 7):如果您随后想获取 position (4, 7) 中的值:

def find_value(row, col):
    # Check to see if the given column is in our index list
    if col not in index_list[row]:
        return 0
    
    # Otherwise return the number in the data list
    myNum = data_list[row][index_list[row].index(col)]
    return myNum
    
find_value(4, 7)
output: 5

Hope this helps!希望这可以帮助!

I would probably do this by passing the data and associated indices into a COO matrix constructor:我可能会通过将数据和相关索引传递到 COO 矩阵构造函数来做到这一点:

import numpy as np
from scipy.sparse import coo_matrix

A = np.asarray([[1,2,1],[2,4,5],[2,1,6]])
indices = np.asarray([-1, -1, 2, -1, 4, -1, -1, 7, -1])

idx = indices[indices >= 0]
col, row = np.meshgrid(idx, idx)

mat = coo_matrix((A.ravel(), (row.ravel(), col.ravel())),
                 shape=(len(indices), len(indices)))
print(mat)
#   (2, 2)  1
#   (2, 4)  2
#   (2, 7)  1
#   (4, 2)  2
#   (4, 4)  4
#   (4, 7)  5
#   (7, 2)  2
#   (7, 4)  1
#   (7, 7)  6

print(mat.todense())
# [[0 0 0 0 0 0 0 0 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 1 0 2 0 0 1 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 2 0 4 0 0 5 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 0 0 0 0 0 0 0]
#  [0 0 2 0 1 0 0 6 0]
#  [0 0 0 0 0 0 0 0 0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM