简体   繁体   English

如何使用scipy编辑稀疏矩阵中的单元格?

[英]How do you edit cells in a sparse matrix using scipy?

I'm trying to manipulate some data in a sparse matrix. 我试图在稀疏矩阵中操纵一些数据。 Once I've created one, how do I add / alter / update values in it? 一旦我创建了一个,我如何在其中添加/更改/更新值? This seems very basic, but I can't find it in the documentation for the sparse matrix classes, or on the web. 这看起来非常基本,但我在稀疏矩阵类的文档中或在Web上找不到它。 I think I'm missing something crucial. 我想我错过了至关重要的事情。

This is my failed attempt to do so the same way I would a normal array. 这是我尝试这样做的失败方式,与普通数组相同。

>>> from scipy.sparse import bsr_matrix
>>> A = bsr_matrix((10,10))
>>> A[5][7] = 6

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    A[5][7] = 6
  File "C:\Python27\lib\site-packages\scipy\sparse\bsr.py", line 296, in __getitem__
    raise NotImplementedError
NotImplementedError

There several Sparse matrix formats. 有几种稀疏矩阵格式。 Some are better suited to indexing. 有些更适合索引。 One that has implemented it is lil_matrix . 实现它的是lil_matrix

Al = A.tolil()
Al[5,7] = 6  # the normal 2d matrix indexing notation
print Al
print Al.A # aka Al.todense()
A1 = Al.tobsr()  # if it must be in bsr format

The documentation for each format suggests what it is good at, and where it is bad. 每种格式的文档都表明它擅长什么,哪些不好。 But it does not have a neat list of which ones have which operations defined. 但它没有一个清晰的列表,其中列出了哪些操作已定义。

Advantages of the LIL format
  supports flexible slicing
  changes to the matrix sparsity structure are efficient
  ...
Intended Usage
  LIL is a convenient format for constructing sparse matrices
  ...

dok_matrix also implements indexing. dok_matrix也实现了索引。

The underlying data structure for coo_matrix is easy to understand. coo_matrix的基础数据结构很容易理解。 It is essentially the parameters for coo_matrix((data, (i, j)), [shape=(M, N)]) definition. 它本质上是coo_matrix((data, (i, j)), [shape=(M, N)])定义的参数。 To create the same matrix you could use: 要创建相同的矩阵,您可以使用:

sparse.coo_matrix(([6],([5],[7])), shape=(10,10))

If you have more assignments, build larger data , i , j lists (or 1d arrays), and when complete construct the sparse matrix. 如果你有更多的任务,建立更大的dataij列表(或1d数组),并在完成构建稀疏矩阵时。

The documentation for bsr is here bsr matrix and for csr is here csr matrix . bsr的文档在这里是bsr矩阵 ,csr的文档在这里是csr矩阵 It might be worth it to understand the csr before moving to the bsr. 在转移到bsr之前理解csr可能是值得的。 The only difference is that bsr has entries that are matrices themselves whereas the basic unit in a csr is a scalar. 唯一的区别是bsr的条目本身就是矩阵,而csr中的基本单位是标量。

I don't know if there are super easy ways to manipulate the matrices once they are created, but here are some examples of what you're trying to do, 我不知道创建矩阵后是否有超级简单的方法来操纵矩阵,但这里有一些你想要做的例子,

import numpy as np
from scipy.sparse import bsr_matrix, csr_matrix

row = np.array( [5] )
col = np.array( [7] )
data = np.array( [6] )
A = csr_matrix( (data,(row,col)) )

This is a straightforward syntax in which you list all the data you want in the matrix in the array data and then specify where that data should go using row and col . 这是一种简单的语法,您可以在其中列出数组data矩阵中所需的所有data ,然后使用rowcol指定数据的col Note that this will make the matrix dimensions just big enough to hold the element in the largest row and column ( in this case a 6x8 matrix ). 请注意,这将使矩阵尺寸足够大,以便将元素保存在最大的行和列中(在本例中为6x8矩阵)。 You can see the matrix in standard form using the todense() method. 您可以使用todense()方法以标准形式查看矩阵。

A.todense()

However, you cannot manipulate the matrix on the fly using this pattern. 但是,您无法使用此模式动态操作矩阵。 What you can do is modify the native scipy representation of the matrix. 你可以做的是修改矩阵的原始scipy表示。 This involves 3 attributes, indices , indptr , and data . 这涉及3个属性, indicesindptrdata To start with, we can examine the value of these attributes for the array we've already created. 首先,我们可以检查我们已经创建的数组的这些属性的值。

>>> print A.data
array([6])

>>> print A.indices
array([7], dtype=int32)

>>> print A.indptr
array([0, 0, 0, 0, 0, 0, 1], dtype=int32)

data is the same thing it was before, a 1-d array of values we want in the matrix. data与以前相同,是矩阵中我们想要的一维数组。 The difference is that the position of this data is now specified by indices and indptr instead of row and col . 不同之indptr于此数据的位置现在由indicesindptr而不是rowcol indices is fairly straightforward. indices相当简单。 It simply a list of which column each data entry is in. It will always be the same size and the data array. 它只是每个数据条目所在列的列表。它将始终具有相同的大小和data数组。 indptr is a little trickier. indptr有点棘手。 It lets the data structure know what row each data entry is in. To quote from the docs, 它允许数据结构知道每个数据条目所在的行。要引用文档,

the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] i行的列索引存储在indices[indptr[i]:indptr[i+1]]

From this definition we can see that the size of indptr will always be the number of rows in the matrix + 1. It takes a little while to get used to it, but working through the values for each row will give you some intuition. 从这个定义我们可以看出, indptr的大小总是矩阵中的行数+ 1。习惯它需要一点时间,但是通过每行的值来给你一些直觉。 Note that all the entries are zero until the last one. 请注意,所有条目都是零,直到最后一个。 That means that the column indices for rows i=0-4 are going to be stored in indices[0:0] ie the empty array. 这意味着行i=0-4的列索引将存储在indices[0:0]即空数组中。 This is because these rows are all zeros. 这是因为这些行都是零。 Finally, on the last row, i=5 we get indices[0:1]=7 which tells us the data entry(ies) data[0:1] are in row 5, column 7. 最后,在最后一行, i=5我们得到indices[0:1]=7 ,它告诉我们数据条目data[0:1]在第5行第7列。

Now suppose we wanted to add the value 10 at row 2 column 4. We first put it into the data attribute, 现在假设我们想在第2行第4列添加值10.我们首先将它放入data属性中,

A.data = np.array( [10,6] )   

next we update indices to indicate the column 10 will be in, 接下来我们更新indices以指示第10列将在,

A.indices = np.array( [4,7], dtype=np.int32 )

and finally we indicate which row it will be in by modifying indptr 最后我们通过修改indptr来指示它将在哪一行

A.indptr = np.array( [0,0,0,1,1,1,2], dtype=np.int32 )

It is important that you make the data type of indices and indptr np.int32 . 制作indices的数据类型和indptr np.int32 indptr np.int32 One way to visualize what's going in in indptr is that the change in numbers occurs as you move from i to i+1 of a row that has data. 可视化indptr内容的一种方法是,当您从i移动到具有数据的行的i+1 ,会发生数字变化。 Also note that arrays like these can be used to construct sparse matrices 还要注意,像这样的数组可用于构造稀疏矩阵

B = csr_matrix( (data,indices,indptr) )

It would be nice if it was as easy as simply indexing into the array as you tried, but the implementation is not there yet. 如果它像你尝试的那样简单地索引到数组中会很容易,但实现还没有。 That should be enough to get you started at least. 这应该足以让你至少开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM