[英]Create an sparse matrix from a list of tuples having the indexes of the column where is a 1
Problem :问题:
I have a list of tuples, which each tuple represents a column of a 2D-array and each element of the tuple represents the index of that column of the array that is a 1;我有一个元组列表,每个元组代表二维数组的一列,元组的每个元素代表数组中该列的索引为 1; the other entries that aren't in that tuple, are 0.
不在该元组中的其他条目为 0。
I want to create an sparse matrix with this list of tuples in an efficient way (trying to not use for loops).我想以一种有效的方式用这个元组列表创建一个稀疏矩阵(尽量不使用循环)。
Example :示例:
# init values
list_tuples = [
(0, 2, 4),
(0, 2, 3),
(1, 3, 4)
]
n = length(list_tuples) + 1
m = 5 # arbritrary, however n >= max([ei for ei in list_tuples]) + 1
# what I need is a function which accepts this tuples and give the shape of the array
# (at least the row size, because the column size can be infered from the list of tuples)
A = some_function(list_tuples, array_shape = (m, n))
Then what I expect to have is an array of the form:那么我希望得到的是以下形式的数组:
[
[1, 1, 0]
[0, 0, 1]
[1, 1, 0]
[0, 1, 1]
[1, 0, 1]
]
Your values are the indices
that are required for the compressed sparse column format .您的值是压缩稀疏列格式所需的
indices
。 You'll also need the indptr
array, which for your data is the cumulative sum of the lengths of the tuples (prepended with 0).您还需要
indptr
数组,对于您的数据来说,它是元组长度的累加和(前缀为 0)。 The data
array would be an array of ones with the same length as the sum of the lengths of the tuples, which you can get from the last element of the cumulative sum. data
数组将是一个长度与元组长度之和相同的数组,您可以从累积和的最后一个元素中获得。 Here's how that looks with your example:以下是您的示例的外观:
In [45]: from scipy.sparse import csc_matrix
In [46]: list_tuples = [
...: (0, 2, 4),
...: (0, 2, 3),
...: (1, 3, 4)
...: ]
In [47]: indices = sum(list_tuples, ()) # Flatten the tuples into one sequence.
In [48]: indptr = np.cumsum([0] + [len(t) for t in list_tuples])
In [49]: a = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr))
In [50]: a
Out[50]:
<5x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Column format>
In [51]: a.A
Out[51]:
array([[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
Note that csc_matrix
inferred the number of rows from the maximum that it found in the indices.请注意,
csc_matrix
从它在索引中找到的最大值推断行数。 You can use the shape
parameter to override that, eg您可以使用
shape
参数来覆盖它,例如
In [52]: b = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr), shape=(7, len(list_tuples)))
In [53]: b
Out[53]:
<7x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Column format>
In [54]: b.A
Out[54]:
array([[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
[1, 0, 1],
[0, 0, 0],
[0, 0, 0]])
You can also generate a coo_matrix
pretty easily.您还可以非常轻松地生成
coo_matrix
。 The flattened list_tuples
gives the row indices, and np.repeat
can be used to create the column indices:展平的
list_tuples
给出行索引, np.repeat
可用于创建列索引:
In [63]: from scipy.sparse import coo_matrix
In [64]: i = sum(list_tuples, ()) # row indices
In [65]: j = np.repeat(range(len(list_tuples)), [len(t) for t in list_tuples])
In [66]: c = coo_matrix((np.ones(len(i), dtype=int), (i, j)))
In [67]: c
Out[67]:
<5x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in COOrdinate format>
In [68]: c.A
Out[68]:
array([[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.