简体   繁体   English

将 maxtrix 从 scipy.sparse.identity 分配给 csr_matrix

[英]Assigning maxtrix from scipy.sparse.identity to csr_matrix

I want to assign a large scale scipy.sparse.identity to a slice of scipy.sparse.csr_matrix but am failing to do so.我想将大规模scipy.sparse.identity分配给scipy.sparse.csr_matrix的切片,但我没有这样做。 In this case, m = 25000000 and p=3 .在这种情况下, m = 25000000p=3 Tc_temp is the csr_matrix of size 25000000 x 75000000 . Tc_temp是大小为25000000 x 75000000 75000000 的csr_matrix

Tc_temp = csr_matrix((m, p * m))
Tc_temp[0: m, np.arange(j, p * m + j, p)] = identity(m, format='csr')

The error traceback I get is:我得到的错误回溯是:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\_index.py", line 116, in __setitem__
    self._set_arrayXarray_sparse(i, j, x)
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 816, in _set_arrayXarray_sparse
    self._zero_many(*self._swap((row, col)))
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 932, in _zero_many
    i, j, M, N = self._prepare_indices(i, j)
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 882, in _prepare_indices
    i = np.array(i, dtype=self.indices.dtype, copy=False, ndmin=1).ravel()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 233. GiB for an array with shape (62500000000,) and data type int32

The sparse.identity is somehow getting converted to dense matrix. sparse.identity以某种方式转换为密集矩阵。

Assignment to sparse matrices isn't efficient.分配给稀疏矩阵效率不高。 It builds a row/column index of the size of your insert.它建立插入大小的行/列索引。 Obviously at this scale that is not viable.显然,在这个规模上这是不可行的。

You can work around it though by fiddling directly with the data in a coordinate matrix, although it won't be efficient.您可以通过直接摆弄坐标矩阵中的数据来解决它,尽管它效率不高。

from scipy.sparse import csr_matrix, identity
import numpy as np

m = 25000000
p = 3
j = 0

Tc_temp = csr_matrix((m, p * m)).tocoo()
Tc_identity = identity(m, format='coo')

# If you know Tc_temp is already 0s where you want to do assignments, you can omit this
# It's gonna be slow if there's a lot of data in Tc_temp
Tc_zero_idx = np.isin(Tc_temp.row, Tc_identity.row) & np.isin(Tc_temp.col, Tc_identity.col * p)
Tc_temp.data[Tc_zero_idx] = 0

# Add the identity matrix to your data
Tc_temp.row = np.append(Tc_temp.row, Tc_identity.row)
Tc_temp.col = np.append(Tc_temp.col, Tc_identity.col * p)
Tc_temp.data = np.append(Tc_temp.data, Tc_identity.data)

Tc_temp.tocsr()

Normally I'd tell you to build it block-wise but if you're trying to interleave rows and columns that's not a great option for you.通常我会告诉你按块构建它,但如果你试图交错行和列,那对你来说不是一个很好的选择。

Let's examine the action for a smaller matrix:让我们检查一个较小矩阵的动作:

The identity - in coo format:身份 - 以 coo 格式:

In [67]: I = sparse.identity(10,format='coo')
In [68]: I.row
Out[68]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [69]: I.col
Out[69]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

The "blank" csr: “空白” csr:

In [70]: M = sparse.csr_matrix((10,30))
In [71]: M.indptr
Out[71]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
In [72]: M.indices
Out[72]: array([], dtype=int32)

The assignment.那作业。 I'm using slice notation here rather than your arange , but the effect is the same (even in timings):我在这里使用切片符号而不是你的arange ,但效果是一样的(即使在时间上):

In [73]: M[0:10, 0:30:3] = I
/usr/local/lib/python3.8/dist-packages/scipy/sparse/_index.py:116: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray_sparse(i, j, x)

The resulting matrix:结果矩阵:

In [74]: M.indptr
Out[74]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=int32)
In [75]: M.indices
Out[75]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)

And look at the coresponding coo attributes:并查看对应的coo属性:

In [76]: M.tocoo().row
Out[76]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [77]: M.tocoo().col
Out[77]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)

The row is the same as for I , while the col is just your arange indexing:rowI相同,而col只是您的arange索引:

In [78]: np.arange(0,30,3)
Out[78]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

So you could create the same matrix with:因此,您可以使用以下方法创建相同的矩阵:

M1 = sparse.csr_matrix((np.ones(10),(np.arange(10), np.arange(0,30,3))),(10,30))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM