繁体   English   中英

从具有列索引的元组列表创建一个稀疏矩阵,其中 是 1

[英]Create an sparse matrix from a list of tuples having the indexes of the column where is a 1

问题

我有一个元组列表,每个元组代表二维数组的一列,元组的每个元素代表数组中该列的索引为 1; 不在该元组中的其他条目为 0。

我想以一种有效的方式用这个元组列表创建一个稀疏矩阵(尽量不使用循环)。

示例

# init values
list_tuples = [
 (0, 2, 4),
 (0, 2, 3),
 (1, 3, 4)
]

n = length(list_tuples) + 1
m = 5 # arbritrary, however n >= max([ei for ei in list_tuples]) + 1

# what I need is a function which accepts this tuples and give the shape of the array
# (at least the row size, because the column size can be infered from the list of tuples)
A = some_function(list_tuples, array_shape = (m, n))

那么我希望得到的是以下形式的数组:

[   
 [1, 1, 0]
 [0, 0, 1]  
 [1, 1, 0]
 [0, 1, 1]
 [1, 0, 1]
]

您的值是压缩稀疏列格式所需的indices 您还需要indptr数组,对于您的数据来说,它是元组长度的累加和(前缀为 0)。 data数组将是一个长度与元组长度之和相同的数组,您可以从累积和的最后一个元素中获得。 以下是您的示例的外观:

In [45]: from scipy.sparse import csc_matrix

In [46]: list_tuples = [
    ...:  (0, 2, 4),
    ...:  (0, 2, 3),
    ...:  (1, 3, 4)
    ...: ]

In [47]: indices = sum(list_tuples, ())  # Flatten the tuples into one sequence.

In [48]: indptr = np.cumsum([0] + [len(t) for t in list_tuples])

In [49]: a = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr))

In [50]: a
Out[50]: 
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Column format>

In [51]: a.A
Out[51]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1]])

请注意, csc_matrix从它在索引中找到的最大值推断行数。 您可以使用shape参数来覆盖它,例如

In [52]: b = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr), shape=(7, len(list_tuples)))

In [53]: b
Out[53]: 
<7x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Column format>

In [54]: b.A
Out[54]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1],
       [0, 0, 0],
       [0, 0, 0]])

您还可以非常轻松地生成coo_matrix 展平的list_tuples给出行索引, np.repeat可用于创建列索引:

In [63]: from scipy.sparse import coo_matrix

In [64]: i = sum(list_tuples, ())  # row indices

In [65]: j = np.repeat(range(len(list_tuples)), [len(t) for t in list_tuples])

In [66]: c = coo_matrix((np.ones(len(i), dtype=int), (i, j)))

In [67]: c
Out[67]: 
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in COOrdinate format>

In [68]: c.A
Out[68]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1]])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM