简体   繁体   English

从具有列索引的元组列表创建一个稀疏矩阵,其中 是 1

[英]Create an sparse matrix from a list of tuples having the indexes of the column where is a 1

Problem :问题

I have a list of tuples, which each tuple represents a column of a 2D-array and each element of the tuple represents the index of that column of the array that is a 1;我有一个元组列表,每个元组代表二维数组的一列,元组的每个元素代表数组中该列的索引为 1; the other entries that aren't in that tuple, are 0.不在该元组中的其他条目为 0。

I want to create an sparse matrix with this list of tuples in an efficient way (trying to not use for loops).我想以一种有效的方式用这个元组列表创建一个稀疏矩阵(尽量不使用循环)。

Example :示例

# init values
list_tuples = [
 (0, 2, 4),
 (0, 2, 3),
 (1, 3, 4)
]

n = length(list_tuples) + 1
m = 5 # arbritrary, however n >= max([ei for ei in list_tuples]) + 1

# what I need is a function which accepts this tuples and give the shape of the array
# (at least the row size, because the column size can be infered from the list of tuples)
A = some_function(list_tuples, array_shape = (m, n))

Then what I expect to have is an array of the form:那么我希望得到的是以下形式的数组:

[   
 [1, 1, 0]
 [0, 0, 1]  
 [1, 1, 0]
 [0, 1, 1]
 [1, 0, 1]
]

Your values are the indices that are required for the compressed sparse column format .您的值是压缩稀疏列格式所需的indices You'll also need the indptr array, which for your data is the cumulative sum of the lengths of the tuples (prepended with 0).您还需要indptr数组,对于您的数据来说,它是元组长度的累加和(前缀为 0)。 The data array would be an array of ones with the same length as the sum of the lengths of the tuples, which you can get from the last element of the cumulative sum. data数组将是一个长度与元组长度之和相同的数组,您可以从累积和的最后一个元素中获得。 Here's how that looks with your example:以下是您的示例的外观:

In [45]: from scipy.sparse import csc_matrix

In [46]: list_tuples = [
    ...:  (0, 2, 4),
    ...:  (0, 2, 3),
    ...:  (1, 3, 4)
    ...: ]

In [47]: indices = sum(list_tuples, ())  # Flatten the tuples into one sequence.

In [48]: indptr = np.cumsum([0] + [len(t) for t in list_tuples])

In [49]: a = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr))

In [50]: a
Out[50]: 
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Column format>

In [51]: a.A
Out[51]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1]])

Note that csc_matrix inferred the number of rows from the maximum that it found in the indices.请注意, csc_matrix从它在索引中找到的最大值推断行数。 You can use the shape parameter to override that, eg您可以使用shape参数来覆盖它,例如

In [52]: b = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr), shape=(7, len(list_tuples)))

In [53]: b
Out[53]: 
<7x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Column format>

In [54]: b.A
Out[54]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1],
       [0, 0, 0],
       [0, 0, 0]])

You can also generate a coo_matrix pretty easily.您还可以非常轻松地生成coo_matrix The flattened list_tuples gives the row indices, and np.repeat can be used to create the column indices:展平的list_tuples给出行索引, np.repeat可用于创建列索引:

In [63]: from scipy.sparse import coo_matrix

In [64]: i = sum(list_tuples, ())  # row indices

In [65]: j = np.repeat(range(len(list_tuples)), [len(t) for t in list_tuples])

In [66]: c = coo_matrix((np.ones(len(i), dtype=int), (i, j)))

In [67]: c
Out[67]: 
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in COOrdinate format>

In [68]: c.A
Out[68]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM