[英]Split sparse matrix by rows
我有一個scipy.sparse.csr.csr_matrix
的(8723, 1741277)
尺寸。
如何有效地將它分成n行?
在行數方面,塊最好大致相等。
我說大概是因為它取決於(行數)/(塊數)是否給出了任何余數。
我認為您可以使用numpy.split
為數組輕松完成此操作,但它似乎不適用於稀疏矩陣。
具體來說,如果我選擇的n-chunks數字不能與8723完全整除,我會收到此錯誤:
ValueError: array split does not result in an equal division
如果我選擇與8723完全可分的n塊數字,我會收到此錯誤:
AxisError: axis1: axis 0 is out of bounds for array of dimension 0
我想在塊中拆分稀疏矩陣的原因是因為我想將我的稀疏矩陣轉換為(密集)數組,但我不能直接這樣做,因為它總體上太大了。
In [6]: from scipy import sparse
In [7]: M = sparse.random(12,3,.1,'csr')
In [8]: np.split?
In [9]: np.split(M,3)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
55 try:
---> 56 return getattr(obj, method)(*args, **kwds)
57
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py in __getattr__(self, attr)
687 else:
--> 688 raise AttributeError(attr + " not found")
689
AttributeError: swapaxes not found
During handling of the above exception, another exception occurred:
AxisError Traceback (most recent call last)
<ipython-input-9-11a4dcdd89af> in <module>
----> 1 np.split(M,3)
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
848 raise ValueError(
849 'array split does not result in an equal division')
--> 850 res = array_split(ary, indices_or_sections, axis)
851 return res
852
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in array_split(ary, indices_or_sections, axis)
760
761 sub_arys = []
--> 762 sary = _nx.swapaxes(ary, axis, 0)
763 for i in range(Nsections):
764 st = div_points[i]
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in swapaxes(a, axis1, axis2)
583
584 """
--> 585 return _wrapfunc(a, 'swapaxes', axis1, axis2)
586
587
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
64 # a downstream library like 'pandas'.
65 except (AttributeError, TypeError):
---> 66 return _wrapit(obj, method, *args, **kwds)
67
68
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
44 except AttributeError:
45 wrap = None
---> 46 result = getattr(asarray(obj), method)(*args, **kwds)
47 if wrap:
48 if not isinstance(result, mu.ndarray):
AxisError: axis1: axis 0 is out of bounds for array of dimension 0
如果我們將np.array
應用於M
我們得到一個0d對象數組; 只是稀疏物體周圍的天真包裝物。
In [10]: np.array(M)
Out[10]:
array(<12x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>, dtype=object)
In [11]: _.shape
Out[11]: ()
拆分正確的密集等價物:
In [12]: np.split(M.A,3)
Out[12]:
[array([[0. , 0.61858517, 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]]), array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]), array([[0. , 0.89573059, 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0.02334738],
[0. , 0. , 0. ]])]
和直接稀疏分裂:
In [13]: [M[i:j,:] for i,j in zip([0,4,8],[4,8,12])]
Out[13]:
[<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>]
對於稀疏矩陣而言,像這樣的切片不如使用密集矩陣那樣有效。 密集切片是視圖。 稀疏的必須是副本。 唯一的例外是lil
格式,它有一個get_rowview
方法。 雖然有很多函數可以從片段構造稀疏矩陣,但是不需要將它們分開的函數。
sklearn
可能具有一些分裂功能。 它有一些稀疏的實用函數,可以解決它自己對稀疏矩陣的使用問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.