為什么 scipy.sparse.csc_matrix 不保留我的 np.array 的索引順序？

Question

我正在編寫代碼以同時有效地從幾個大型並行 scipy sparse.csc矩陣（意味着所有矩陣具有相同的暗度，並且所有 nnz 元素都在相同的位置）中刪除多個列。 我這樣做是通過僅索引我想為一個矩陣保留的列，然后為其他矩陣重用索引和 indptr 列表。 但是，當我通過列表索引 csc 矩陣時，它會重新排序數據列表，因此我無法重用索引。 有沒有辦法強制 scipy 將數據列表保持在原始順序？ 為什么僅在按列表索引時才重新排序？

import scipy.sparse
import numpy as np
mat = scipy.sparse.csc_matrix(np.array([[1,0,0,0,2,5], 
                                        [1,0,1,0,0,0], 
                                        [0,0,0,4,0,1],
                                        [0,3,0,1,0,4]]))
print mat[:,3].data

返回數組([4, 1])

print mat[:,[3]].data

返回數組([1, 4])

Answer 1

In [43]: mat = sparse.csc_matrix(np.array([[1,0,0,0,2,5],[1,0,1,0,0,0],[0,0,0,4,
    ...: 0,1],[0,3,0,1,0,4]])) 
    ...:  
    ...:                                                                        
In [44]: mat                                                                    
Out[44]: 
<4x6 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Column format>
In [45]: mat.data                                                               
Out[45]: array([1, 1, 3, 1, 4, 1, 2, 5, 1, 4], dtype=int64)
In [46]: mat.indices                                                            
Out[46]: array([0, 1, 3, 1, 2, 3, 0, 0, 2, 3], dtype=int32)
In [47]: mat.indptr                                                             
Out[47]: array([ 0,  2,  3,  4,  6,  7, 10], dtype=int32)

標量選擇：

In [48]: m1 = mat[:,3]                                                          
In [49]: m1                                                                     
Out[49]: 
<4x1 sparse matrix of type '<class 'numpy.int64'>'
    with 2 stored elements in Compressed Sparse Column format>
In [50]: m1.data                                                                
Out[50]: array([4, 1])
In [51]: m1.indices                                                             
Out[51]: array([2, 3], dtype=int32)
In [52]: m1.indptr                                                              
Out[52]: array([0, 2], dtype=int32)

列表索引：

In [53]: m2 = mat[:,[3]]                                                        
In [54]: m2.data                                                                
Out[54]: array([1, 4], dtype=int64)
In [55]: m2.indices                                                             
Out[55]: array([3, 2], dtype=int32)
In [56]: m2.indptr                                                              
Out[56]: array([0, 2], dtype=int32)

排序：

In [57]: m2.sort_indices()                                                      
In [58]: m2.data                                                                
Out[58]: array([4, 1], dtype=int64)
In [59]: m2.indices                                                             
Out[59]: array([2, 3], dtype=int32)

帶有列表的 csc 索引使用矩陣乘法。 它根據索引構造一個提取器矩陣，然后進行點乘。 所以它是一個全新的稀疏矩陣； 不僅僅是 csc 數據和索引屬性的子集。

csc 矩陣有一種方法可以確保索引值是有序的（在一列內）。 應用它可能有助於確保以相同的方式對數組進行排序。

為什么 scipy.sparse.csc_matrix 不保留我的 np.array 的索引順序？

問題描述

1 個解決方案

解決方案1
1 已采納 2019-03-22 20:32:05

為什么 scipy.sparse.csc_matrix 不保留我的 np.array 的索引順序？

問題描述

1 個解決方案

解決方案1 1 已采納 2019-03-22 20:32:05

解決方案1
1 已采納 2019-03-22 20:32:05