简体   繁体   English

快速稀疏矩阵乘法,不分配密集数组

[英]Fast sparse matrix multiplication w/o allocating a dense array

I have an mxm sparse matrix similarities and a vector with m elements, combined_scales . 我有一个mxm稀疏矩阵similarities和一个带有m个元素的向量combined_scales I wish to multiply the ith column in similarities by combined_scales[i] . 我希望将similarities的第ith列乘以combined_scales[i] Here's my first attempt at this: 这是我的第一次尝试:

for i in range(m):
    scale = combined_scales[i]
    similarities[:, i] *= scale

This is semantically correct but was performing poorly, so I tried changing it to this: 从语义上来说这是正确的,但效果不佳,因此我尝试将其更改为:

# sparse.diags creates a diagonal matrix.
# docs: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.diags.html
similarities *= sparse.diags(combined_scales)

But I immediately got a MemoryError when running this line. 但是运行此行时,我立即收到了MemoryError Bizarrely, it seems that scipy is attempting to allocate a dense numpy array here: 奇怪的是,似乎scipy试图在此处分配一个密集的numpy数组:

Traceback (most recent call last):
  File "main.py", line 108, in <module>
    loop.run_until_complete(main())
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 466, in run_until_complete
    return future.result()
  File "main.py", line 100, in main
    magic.fit(df)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 127, in fit
    self._scale_similarities(X, net_similarities)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 148, in _scale_similarities
    similarities *= sparse.diags(combined_scales)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\base.py", line 440, in __mul__
    return self._mul_sparse_matrix(other)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\compressed.py", line 503, in _mul_sparse_matrix
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
MemoryError

How do I prevent it from allocating a dense array here? 如何防止它在此处分配密集数组? Thanks. 谢谢。

From sparse.compressed 来自sparse.compressed

class _cs_matrix    # common for csr and csc
    def _mul_sparse_matrix(self, other):
        M, K1 = self.shape
        K2, N = other.shape

        major_axis = self._swap((M,N))[0]
        other = self.__class__(other)  # convert to this format

        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=M*N)
        indptr = np.empty(major_axis + 1, dtype=idx_dtype)

        fn = getattr(_sparsetools, self.format + '_matmat_pass1')
        fn(M, N,
           np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           indptr)

        nnz = indptr[-1]
        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=nnz)
        indptr = np.asarray(indptr, dtype=idx_dtype)
        indices = np.empty(nnz, dtype=idx_dtype)
        data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

        fn = getattr(_sparsetools, self.format + '_matmat_pass2')
        fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           self.data,
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           other.data,
           indptr, indices, data)

        return self.__class__((data,indices,indptr),shape=(M,N))

similarities is a sparse csr matrix. similarities是稀疏的csr矩阵。 other , the diag matrix, has been converted to csr as well in otherdiag矩阵在以下情况下也已转换为csr:

other = self.__class__(other) 

csr_matmat_pass1 (compiled code) is run with the indices from self and other , returning nnz , the number of nonzero terms in the output. csr_matmat_pass1 (编译后的代码)使用selfother的索引运行,返回nnz ,即输出中非零项的数量。

It then allocates the indptr , indices and data arrays that will hold the results from csr_matmat_pass2 . 然后,它分配将保存来自csr_matmat_pass2的结果的indptrindicesdata数组。 These are used to create the return matrix 这些用于创建返回矩阵

self.__class__((data,indices,indptr),shape=(M,N))

The error occurs in creating the data array: 创建data数组时发生错误:

data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

The return result just has too many nonzero values for your memory. 返回结果中包含太多非零值的内存。

What is m , and similarities.nnz ? 什么是msimilarities.nnz

Is there enough memory to do similarities.copy() ? 是否有足够的内存来做similarities.copy() ?)

While you are using similarities *= ... , it first has to do similarities * other . 当您使用similarities *= ... ,它首先必须进行similarities * other The result will then replace self . 结果将取代self It does not attempt to do an in-place multiplication. 它不会尝试进行就地乘法。

inplace iteration by column 按列进行迭代

There have been a lot of questions about faster iteration by rows (or columns), seeking to do things like sorting or getting the largest row values. 关于按行(或列)进行更快的迭代,试图进行诸如排序或获取最大的行值之类的事情,存在很多问题。 Working directly with the csr attributes can speed this up considerably. 直接使用csr属性可以大大加快此过程。 I think the idea applies here 我认为这个想法在这里适用

Example: 例:

In [275]: A = sparse.random(10,10,.2,'csc').astype(int)
In [276]: A.data[:] = np.arange(1,21)
In [277]: A.A
Out[277]: 
array([[ 0,  0,  4,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0, 10,  0,  0, 16, 18],
       [ 0,  0,  0,  0,  0, 11, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  8,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  9, 12,  0,  0, 17,  0],
       [ 2,  0,  0,  0,  0, 13,  0,  0,  0,  0],
       [ 0,  0,  5,  7,  0,  0,  0, 15,  0, 19],
       [ 0,  0,  6,  0,  0,  0,  0,  0,  0, 20]])
In [280]: B = sparse.diags(np.arange(1,11),dtype=int)
In [281]: B
Out[281]: 
<10x10 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements (1 diagonals) in DIAgonal format>
In [282]: (A*B).A
Out[282]: 
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]], dtype=int64)

Inplace iteration on columns: 在列上进行迭代:

In [283]: A1=A.copy()
In [284]: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:     
In [285]: A1.A
Out[285]: 
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]])

Time comparisons: 时间比较:

In [287]: %%timeit A1=A.copy()
     ...: A1 *= B
     ...: 
375 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [288]: %%timeit A1 = A.copy()
     ...: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:     
79.9 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM