简体   繁体   English

Python-Scipy稀疏矩阵-A [i,j]在做什么?

[英]Python-Scipy sparse Matrices - what is A[i, j] doing?

According to my previous question here ( Python - Multiply sparse matrix row with non-sparse vector by index ) direct indexing of sparse matrices is not possible (at least not if you don't want to work with the three arrays by which the sparse.csr matrix is defined, data , indices , indptr ). 根据我在这里的上一个问题( Python-将稀疏矩阵行与非稀疏矢量乘以索引 ),稀疏矩阵的直接索引是不可能的(至少如果您不想使用sparse.csr的三个数组,则无法这样做) sparse.csr矩阵定义, dataindicesindptr )。 But I just found out, that given a csr-sparse matrix A , this operation works fine and produces correct results: A[i, j] . 但是我刚刚发现,给定一个csr稀疏矩阵A ,该操作可以正常工作并产生正确的结果: A[i, j] What I also noticed: It is horribly slow, even slower than working with dense matrices. 我还注意到:它的速度非常慢,甚至比使用密集矩阵还要慢。

I couldn't find any information about this indexing method so I am wondering: What exactly is A[i, j] doing? 我找不到有关此索引方法的任何信息,所以我想知道: A[i, j]到底在做什么?

If you like me to take a guess I would say it is producing a dense matrix and then indexing it like you normally would. 如果您希望我做出猜测,我会说它正在生成一个密集矩阵,然后像平常一样对它进行索引。

In [211]: M = sparse.csr_matrix(np.eye(3))                                   
In [212]: M                                                                  
Out[212]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

Indexing with [0] produces a new sparse matrix, (1,3) shape: 用[0]索引会产生一个新的稀疏矩阵,形状为(1,3):

In [213]: M[0]                                                               
Out[213]: 
<1x3 sparse matrix of type '<class 'numpy.float64'>'
    with 1 stored elements in Compressed Sparse Row format>

Trying to index that again gives another sparse matrix, or error. 尝试再次索引它会给出另一个稀疏矩阵或错误。 That's because it is still a 2d object (1,3) shape. 那是因为它仍然是2d对象(1,3)的形状。

In [214]: M[0][1]                                                            
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-214-0661a1f27e52> in <module>
----> 1 M[0][1]

/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in __getitem__(self, key)
    290             # [i, 1:2]
    291             elif isinstance(col, slice):
--> 292                 return self._get_row_slice(row, col)
    293             # [i, [1, 2]]
    294             elif issequence(col):

/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in _get_row_slice(self, i, cslice)
    397 
    398         if i < 0 or i >= M:
--> 399             raise IndexError('index (%d) out of range' % i)
    400 
    401         start, stop, stride = cslice.indices(N)

IndexError: index (1) out of range

Indexing with the [0,1] syntax does work, with the two numbers applying to the two different dimensions: 使用[0,1]语法进行索引确实有效,这两个数字分别应用于两个不同的维度:

In [215]: M[0,1]                                                             
Out[215]: 0.0

A[0][1] does work with a np.ndarray , but that's because the first [0] produces an array with 1 less dimension. A[0][1]可以与np.ndarray一起使用,但这是因为第一个[0]产生的数组的尺寸减少了1个。 But np.matrix , and sparse returns a 2d matrix, not a 1d one. 但是np.matrixsparse返回2d矩阵,而不是1d矩阵。 It's one reason we don't recommend np.matrix . 这是我们不建议使用np.matrix原因np.matrix With sparse the matrix nature goes deeper, so we can't simply depricate it. 由于sparse ,矩阵的性质会更深,因此我们不能简单地将其贬低。

We can get an idea of the code involved in selecting an element from a sparse matrix by triggering an error: 我们可以通过触发错误来了解从稀疏矩阵中选择元素所涉及的代码:

In [216]: M[0,4]                                                             
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-216-4919ae565782> in <module>
----> 1 M[0,4]

/usr/local/lib/python3.6/dist-packages/scipy/sparse/csr.py in __getitem__(self, key)
    287             # [i, j]
    288             if isintlike(col):
--> 289                 return self._get_single_element(row, col)
    290             # [i, 1:2]
    291             elif isinstance(col, slice):

/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py in _get_single_element(self, row, col)
    868         if not (0 <= row < M) or not (0 <= col < N):
    869             raise IndexError("index out of bounds: 0<=%d<%d, 0<=%d<%d" %
--> 870                              (row, M, col, N))
    871 
    872         major_index, minor_index = self._swap((row, col))

IndexError: index out of bounds: 0<=0<3, 0<=4<3

=== ===

Yes, indexing an item in a sparse matrix is slower than indexing in a dense array. 是的,在稀疏矩阵中对项目进行索引比在密集数组中对索引进行索引要慢。 It's not because it first converts to dense. 不是因为它首先转换为密集型。 With a dense array indexing an item just requires converting the nd index to a flat one, and selecting the required bytes in the 1d flat data buffer - and most of that is done in fast compiled code. 使用密集数组索引时,一项仅需要将nd索引转换为一个平面索引,并在1d平面数据缓冲区中选择所需的字节-大部分都通过快速编译的代码完成。 But as you can see from the traceback, selecting an item from sparse matrix is more involved, and a lot of it Python. 但是,从回溯中可以看到,从稀疏矩阵中选择一个项目涉及更多,其中很多是Python。

Sparse lil format is designed to be faster for indexing (and especially for setting). 稀疏的lil格式旨在更快地建立索引(尤其是设置)。 But even that is quite a bit slower than indexing a dense array. 但这甚至比索引密集数组要慢很多。 Don't use sparse matrices if you need to iterate, or otherwise repeatedly access individual elements. 如果需要进行迭代,请不要使用稀疏矩阵,否则将反复访问单个元素。

=== ===

To give an idea of what's involved with indexing M , look at its key attributes: 要了解索引M涉及的内容,请查看其关键属性:

In [224]: M.data,M.indices,M.indptr                                          
Out[224]: 
(array([1., 1., 1.]),
 array([0, 1, 2], dtype=int32),
 array([0, 1, 2, 3], dtype=int32))

To pick row 0, we have to use indptr to select a slice from the others: 要选择第0行,我们必须使用indptr从其他indptr中选择一个切片:

In [225]: slc = slice(M.indptr[0],M.indptr[1])                               
In [226]: M.data[slc], M.indices[slc]                                        
Out[226]: (array([1.]), array([0], dtype=int32))

then to pick col 1, we have to check whether that values is in indices[slc] . 然后选择第1行,我们必须检查该值是否在indices[slc] If it is, return the corresponding element in data[slc] . 如果是,则返回data[slc]的相应元素。 If not return 0. 如果不返回0。

For more complex indexing, sparse actually uses matrix multiplication, having created an extractor matrix from the indices. 对于更复杂的索引,稀疏实际上使用了矩阵乘法,并从索引中创建了一个extractor矩阵。 It also uses multiplication to perform row or column sums. 它还使用乘法执行行或列求和。

Matrix multiplication is a sparse matrix strength - provided the matrix is sparse enough. 矩阵乘法是矩阵的稀疏强度-如果矩阵足够稀疏。 The mathematical roots of sparse formats, especially csr are in sparse linear equation problems, such as finite difference and finite element PDES. 稀疏格式尤其是csr的数学根源在于稀疏线性方程式问题,例如有限差分和有限元PDES。

=== ===

Here's the underlying attributes for a lil matrix 这是lil矩阵的基本属性

In [227]: ml=M.tolil()                                                       
In [228]: ml.data                                                            
Out[228]: array([list([1.0]), list([1.0]), list([1.0])], dtype=object)
In [229]: ml.rows                                                            
Out[229]: array([list([0]), list([1]), list([2])], dtype=object)
In [230]: ml.data[0],ml.rows[0]                                              
Out[230]: ([1.0], [0])          # cf Out[226]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM