[英]Fast sparse matrix multiplication w/o allocating a dense array
I have an mxm sparse matrix similarities
and a vector with m elements, combined_scales
. 我有一个mxm稀疏矩阵
similarities
和一个带有m个元素的向量combined_scales
。 I wish to multiply the ith column in similarities
by combined_scales[i]
. 我希望将
similarities
的第ith列乘以combined_scales[i]
。 Here's my first attempt at this: 这是我的第一次尝试:
for i in range(m):
scale = combined_scales[i]
similarities[:, i] *= scale
This is semantically correct but was performing poorly, so I tried changing it to this: 从语义上来说这是正确的,但效果不佳,因此我尝试将其更改为:
# sparse.diags creates a diagonal matrix.
# docs: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.diags.html
similarities *= sparse.diags(combined_scales)
But I immediately got a MemoryError
when running this line. 但是运行此行时,我立即收到了
MemoryError
。 Bizarrely, it seems that scipy is attempting to allocate a dense numpy array here: 奇怪的是,似乎scipy试图在此处分配一个密集的numpy数组:
Traceback (most recent call last):
File "main.py", line 108, in <module>
loop.run_until_complete(main())
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 466, in run_until_complete
return future.result()
File "main.py", line 100, in main
magic.fit(df)
File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 127, in fit
self._scale_similarities(X, net_similarities)
File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 148, in _scale_similarities
similarities *= sparse.diags(combined_scales)
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\base.py", line 440, in __mul__
return self._mul_sparse_matrix(other)
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\compressed.py", line 503, in _mul_sparse_matrix
data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
MemoryError
How do I prevent it from allocating a dense array here? 如何防止它在此处分配密集数组? Thanks.
谢谢。
From sparse.compressed
来自
sparse.compressed
class _cs_matrix # common for csr and csc
def _mul_sparse_matrix(self, other):
M, K1 = self.shape
K2, N = other.shape
major_axis = self._swap((M,N))[0]
other = self.__class__(other) # convert to this format
idx_dtype = get_index_dtype((self.indptr, self.indices,
other.indptr, other.indices),
maxval=M*N)
indptr = np.empty(major_axis + 1, dtype=idx_dtype)
fn = getattr(_sparsetools, self.format + '_matmat_pass1')
fn(M, N,
np.asarray(self.indptr, dtype=idx_dtype),
np.asarray(self.indices, dtype=idx_dtype),
np.asarray(other.indptr, dtype=idx_dtype),
np.asarray(other.indices, dtype=idx_dtype),
indptr)
nnz = indptr[-1]
idx_dtype = get_index_dtype((self.indptr, self.indices,
other.indptr, other.indices),
maxval=nnz)
indptr = np.asarray(indptr, dtype=idx_dtype)
indices = np.empty(nnz, dtype=idx_dtype)
data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
fn = getattr(_sparsetools, self.format + '_matmat_pass2')
fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
np.asarray(self.indices, dtype=idx_dtype),
self.data,
np.asarray(other.indptr, dtype=idx_dtype),
np.asarray(other.indices, dtype=idx_dtype),
other.data,
indptr, indices, data)
return self.__class__((data,indices,indptr),shape=(M,N))
similarities
is a sparse csr matrix. similarities
是稀疏的csr矩阵。 other
, the diag
matrix, has been converted to csr as well in other
, diag
矩阵在以下情况下也已转换为csr:
other = self.__class__(other)
csr_matmat_pass1
(compiled code) is run with the indices from self
and other
, returning nnz
, the number of nonzero terms in the output. csr_matmat_pass1
(编译后的代码)使用self
和other
的索引运行,返回nnz
,即输出中非零项的数量。
It then allocates the indptr
, indices
and data
arrays that will hold the results from csr_matmat_pass2
. 然后,它分配将保存来自
csr_matmat_pass2
的结果的indptr
, indices
和data
数组。 These are used to create the return matrix 这些用于创建返回矩阵
self.__class__((data,indices,indptr),shape=(M,N))
The error occurs in creating the data
array: 创建
data
数组时发生错误:
data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
The return result just has too many nonzero values for your memory. 返回结果中包含太多非零值的内存。
What is m
, and similarities.nnz
? 什么是
m
和similarities.nnz
?
Is there enough memory to do similarities.copy()
? 是否有足够的内存来做
similarities.copy()
?)
While you are using similarities *= ...
, it first has to do similarities * other
. 当您使用
similarities *= ...
,它首先必须进行similarities * other
。 The result will then replace self
. 结果将取代
self
。 It does not attempt to do an in-place multiplication. 它不会尝试进行就地乘法。
There have been a lot of questions about faster iteration by rows (or columns), seeking to do things like sorting or getting the largest row values. 关于按行(或列)进行更快的迭代,试图进行诸如排序或获取最大的行值之类的事情,存在很多问题。 Working directly with the
csr
attributes can speed this up considerably. 直接使用
csr
属性可以大大加快此过程。 I think the idea applies here 我认为这个想法在这里适用
Example: 例:
In [275]: A = sparse.random(10,10,.2,'csc').astype(int)
In [276]: A.data[:] = np.arange(1,21)
In [277]: A.A
Out[277]:
array([[ 0, 0, 4, 0, 0, 0, 0, 0, 0, 0],
[ 0, 3, 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, 0, 0, 10, 0, 0, 16, 18],
[ 0, 0, 0, 0, 0, 11, 14, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 8, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 9, 12, 0, 0, 17, 0],
[ 2, 0, 0, 0, 0, 13, 0, 0, 0, 0],
[ 0, 0, 5, 7, 0, 0, 0, 15, 0, 19],
[ 0, 0, 6, 0, 0, 0, 0, 0, 0, 20]])
In [280]: B = sparse.diags(np.arange(1,11),dtype=int)
In [281]: B
Out[281]:
<10x10 sparse matrix of type '<class 'numpy.int64'>'
with 10 stored elements (1 diagonals) in DIAgonal format>
In [282]: (A*B).A
Out[282]:
array([[ 0, 0, 12, 0, 0, 0, 0, 0, 0, 0],
[ 0, 6, 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, 0, 0, 60, 0, 0, 144, 180],
[ 0, 0, 0, 0, 0, 66, 98, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 40, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 45, 72, 0, 0, 153, 0],
[ 2, 0, 0, 0, 0, 78, 0, 0, 0, 0],
[ 0, 0, 15, 28, 0, 0, 0, 120, 0, 190],
[ 0, 0, 18, 0, 0, 0, 0, 0, 0, 200]], dtype=int64)
Inplace iteration on columns: 在列上进行迭代:
In [283]: A1=A.copy()
In [284]: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
...: A1.data[i:j] *= v
...:
In [285]: A1.A
Out[285]:
array([[ 0, 0, 12, 0, 0, 0, 0, 0, 0, 0],
[ 0, 6, 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, 0, 0, 60, 0, 0, 144, 180],
[ 0, 0, 0, 0, 0, 66, 98, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 40, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 45, 72, 0, 0, 153, 0],
[ 2, 0, 0, 0, 0, 78, 0, 0, 0, 0],
[ 0, 0, 15, 28, 0, 0, 0, 120, 0, 190],
[ 0, 0, 18, 0, 0, 0, 0, 0, 0, 200]])
Time comparisons: 时间比较:
In [287]: %%timeit A1=A.copy()
...: A1 *= B
...:
375 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [288]: %%timeit A1 = A.copy()
...: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
...: A1.data[i:j] *= v
...:
79.9 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.