具有广播的稀疏 Scipy 矩阵和向量的元素最大值

Question

I need a fast element-wise maximum that compares each row of an n-by-m scipy sparse matrix element-wise to a sparse 1-by-m matrix.我需要一个快速的逐元素最大值，它将 n×m scipy 稀疏矩阵元素的每一行与稀疏的 1×m 矩阵进行比较。 This works perfectly in Numpy using np.maximum(mat, vec) via Numpy's broadcasting.这在 Numpy 中使用np.maximum(mat, vec)通过 Numpy 的广播完美地工作。

However, Scipy's .maximum() does not have broadcasting.但是，Scipy 的.maximum()没有广播。 My matrix is large, so I cannot cast it to a numpy array.我的矩阵很大，所以我不能将它转换为一个 numpy 数组。

My current workaround is to loop over the many rows of mat with mat[row,:].maximum(vec) .我目前的解决方法是使用mat[row,:].maximum(vec)遍历多行 mat 。 This big loop is ruining my code efficiency (it has to be done many times).这个大循环正在破坏我的代码效率（必须多次执行）。 My slow solution is in the second code snippet below -- Is there a better solution?我的缓慢解决方案在下面的第二个代码片段中 - 有更好的解决方案吗？

# Example
import numpy as np
from scipy import sparse

mat = sparse.csc_matrix(np.arange(12).reshape((4,3)))

vec = sparse.csc_matrix([-1, 5, 100])

# Numpy's np.maximum() gives the **desired result** using broadcasting (but it can't handle sparse matrices):
numpy_result = np.maximum( mat.toarray(), vec.toarray() )
print( numpy_result )
# [[  0   5 100]
#  [  3   5 100]
#  [  6   7 100]
#  [  9  10 100]]

# Scipy only compares the top row of mat to vec (no broadcasting!):
scipy_result = mat.maximum(vec)
print( scipy_result.toarray() )
# [[  0   5 100]
#  [  3   4   5]
#  [  6   7   8]
#  [  9  10  11]]

#Reversing the order of mat and vec in the call to vec.maximum(mat) results in a single row output, and also frequently seg faults (!):

Larger example & current solution for speed testing用于速度测试的更大示例和当前解决方案

import numpy as np
from scipy import sparse
import timeit

mat = sparse.csc_matrix(  sparse.random(20000, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )

vec = sparse.csc_matrix(  sparse.random(1, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )

def sparse_elementwise_maximum(mat, vec):
    output = sparse.lil_matrix(mat.shape)
    for row_idx in range( mat.shape[0] ):
        output[row_idx] = mat[row_idx,:].maximum(vec)
    return output

# Time it
num_timing_loops = 3.0
starttime = timeit.default_timer()
for _ in range(int(num_timing_loops)):
    sparse_elementwise_maximum(mat, vec)
print('time per call is:', (timeit.default_timer() - starttime)/num_timing_loops, 'seconds')
# 15 seconds per call (way too slow!)

EDIT I'm accepting Max's answer, as the question was specifically about a high performance solution, and Max's solution offers huge 1000x-2500x speedups on various inputs I tried at the expense of more lines of code and Numba compiling.编辑我接受 Max 的回答，因为这个问题专门针对高性能解决方案，Max 的解决方案在我尝试的各种输入上提供了 1000x-2500x 的巨大加速，但代价是更多的代码行和 Numba 编译。 However, for general use, Daniel F's one-liner is a great solution offers 10x-50x speedups on examples I tried--I will probably use for many other things.但是，对于一般用途，Daniel F 的 one-liner 是一个很好的解决方案，在我尝试过的示例上提供了 10 到 50 倍的加速——我可能会用于许多其他事情。

Answer 1

A low level approach低级方法

As always you can think on how a proper sparse matrix format for this operation is built up, for csr-matrices the main components are shape, data_arr,indices and ind_ptr.与往常一样，您可以考虑如何为该操作构建适当的稀疏矩阵格式，对于 csr 矩阵，主要组件是 shape、data_arr、indices 和 ind_ptr。 With these parts of the scipy.sparse.csr object it is quite straight forward but maybe a bit time consuming to implement an efficient algorithm in a compiled language (C,C++,Cython, Python-Numba).使用 scipy.sparse.csr 对象的这些部分，使用编译语言（C、C++、Cython、Python-Numba）实现高效算法非常简单，但可能有点耗时。 Int his implemenation I used Numba, but porting it to C++ should be easily possible (syntax changes) and maybe avoiding the slicing. Int 他的实现我使用了 Numba，但是将它移植到 C++ 应该很容易（语法更改）并且可能避免切片。

Implementation (first try)实现（第一次尝试）

import numpy as np
import numba as nb

# get all needed components of the csr object and create a resulting csr object at the end
def sparse_elementwise_maximum_wrap(mat,vec):
    mat_csr=mat.tocsr()
    vec_csr=vec.tocsr()

    shape_mat=mat_csr.shape
    indices_mat=mat_csr.indices
    indptr_mat=mat_csr.indptr
    data_mat=mat_csr.data
    indices_vec=vec_csr.indices
    data_vec=vec_csr.data

    res=sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,indices_vec,data_vec)
    res=sparse.csr_matrix(res, shape=shape_mat)
    return res

@nb.njit(cache=True)
def sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,vec_row_ind,vec_row_data):
    data_res=[]
    indices_res=[]
    indptr_mat_res=[]

    indptr_mat_=0
    indptr_mat_res.append(indptr_mat_)

    for row_idx in range(shape_mat[0]):
        mat_row_ind=indices_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]
        mat_row_data=data_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]

        mat_ptr=0
        vec_ptr=0
        while mat_ptr<mat_row_ind.shape[0] and vec_ptr<vec_row_ind.shape[0]:
            ind_mat=mat_row_ind[mat_ptr]
            ind_vec=vec_row_ind[vec_ptr]

            #value for both matrix and vector is present
            if ind_mat==ind_vec:
                data_res.append(max(mat_row_data[mat_ptr],vec_row_data[vec_ptr]))
                indices_res.append(ind_mat)
                mat_ptr+=1
                vec_ptr+=1
                indptr_mat_+=1

            #only value for the matrix is present vector is assumed 0
            elif ind_mat<ind_vec:
                if mat_row_data[mat_ptr] >0:
                    data_res.append(mat_row_data[mat_ptr])
                    indices_res.append(ind_mat)
                    indptr_mat_+=1
                mat_ptr+=1

            #only value for the vector is present matrix is assumed 0
            else:
                if vec_row_data[vec_ptr] >0:
                    data_res.append(vec_row_data[vec_ptr])
                    indices_res.append(ind_vec)
                    indptr_mat_+=1
                vec_ptr+=1

        for i in range(mat_ptr,mat_row_ind.shape[0]):
            if mat_row_data[i] >0:
                data_res.append(mat_row_data[i])
                indices_res.append(mat_row_ind[i])
                indptr_mat_+=1
        for i in range(vec_ptr,vec_row_ind.shape[0]):
            if vec_row_data[i] >0:
                data_res.append(vec_row_data[i])
                indices_res.append(vec_row_ind[i])
                indptr_mat_+=1
        indptr_mat_res.append(indptr_mat_)

    return np.array(data_res),np.array(indices_res),np.array(indptr_mat_res)

Implementation (optimized)实施（优化）

In this approach the lists are replaced by a dynamically resized array.在这种方法中，列表被动态调整大小的数组替换。 I increased the size of the output in 60 MB steps.我以 60 MB 的步长增加了输出的大小。 On creation of the csr-object, there is also no copy of the data made, just references.在创建 csr 对象时，也没有复制的数据，只有引用。 If you want avoid a memory overhead you have to copy the arrays in the end.如果你想避免内存开销，你必须最后复制数组。

@nb.njit(cache=True)
def sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,vec_row_ind,vec_row_data):
    mem_step=5_000_000
    #preallocate memory for 5M non-zero elements (60 MB in this example)
    data_res=np.empty(mem_step,dtype=data_mat.dtype)
    indices_res=np.empty(mem_step,dtype=np.int32)
    data_res_p=0

    indptr_mat_res=np.empty((shape_mat[0]+1),dtype=np.int32)
    indptr_mat_res[0]=0
    indptr_mat_res_p=1
    indptr_mat_=0

    for row_idx in range(shape_mat[0]):
        mat_row_ind=indices_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]
        mat_row_data=data_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]

        #check if resizing is necessary
        if data_res.shape[0]<data_res_p+shape_mat[1]:
            #add at least memory for another mem_step elements
            size_to_add=mem_step
            if shape_mat[1] >size_to_add:
                size_to_add=shape_mat[1]

            data_res_2   =np.empty(data_res.shape[0]   +size_to_add,data_res.dtype)
            indices_res_2=np.empty(indices_res.shape[0]+size_to_add,indices_res.dtype)
            for i in range(data_res_p):
                data_res_2[i]=data_res[i]
                indices_res_2[i]=indices_res[i]
            data_res=data_res_2
            indices_res=indices_res_2

        mat_ptr=0
        vec_ptr=0
        while mat_ptr<mat_row_ind.shape[0] and vec_ptr<vec_row_ind.shape[0]:
            ind_mat=mat_row_ind[mat_ptr]
            ind_vec=vec_row_ind[vec_ptr]

            #value for both matrix and vector is present
            if ind_mat==ind_vec:
                data_res[data_res_p]=max(mat_row_data[mat_ptr],vec_row_data[vec_ptr])
                indices_res[data_res_p]=ind_mat
                data_res_p+=1
                mat_ptr+=1
                vec_ptr+=1
                indptr_mat_+=1

            #only value for the matrix is present vector is assumed 0
            elif ind_mat<ind_vec:
                if mat_row_data[mat_ptr] >0:
                    data_res[data_res_p]=mat_row_data[mat_ptr]
                    indices_res[data_res_p]=ind_mat
                    data_res_p+=1
                    indptr_mat_+=1
                mat_ptr+=1

            #only value for the vector is present matrix is assumed 0
            else:
                if vec_row_data[vec_ptr] >0:
                    data_res[data_res_p]=vec_row_data[vec_ptr]
                    indices_res[data_res_p]=ind_vec
                    data_res_p+=1
                    indptr_mat_+=1
                vec_ptr+=1

        for i in range(mat_ptr,mat_row_ind.shape[0]):
            if mat_row_data[i] >0:
                data_res[data_res_p]=mat_row_data[i]
                indices_res[data_res_p]=mat_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        for i in range(vec_ptr,vec_row_ind.shape[0]):
            if vec_row_data[i] >0:
                data_res[data_res_p]=vec_row_data[i]
                indices_res[data_res_p]=vec_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        indptr_mat_res[indptr_mat_res_p]=indptr_mat_
        indptr_mat_res_p+=1

    return data_res[:data_res_p],indices_res[:data_res_p],indptr_mat_res

Maximum memory allocated in the beginning开始时分配的最大内存

The performance and usability of this approach heavily depends on the inputs.这种方法的性能和可用性在很大程度上取决于输入。 In this approach the maximal memory is allocated (this could easily cause out of memory errors).在这种方法中分配了最大内存（这很容易导致内存不足错误）。

@nb.njit(cache=True)
def sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,vec_row_ind,vec_row_data,shrink_to_fit):
    max_non_zero=shape_mat[0]*vec_row_data.shape[0]+data_mat.shape[0]
    data_res=np.empty(max_non_zero,dtype=data_mat.dtype)
    indices_res=np.empty(max_non_zero,dtype=np.int32)
    data_res_p=0

    indptr_mat_res=np.empty((shape_mat[0]+1),dtype=np.int32)
    indptr_mat_res[0]=0
    indptr_mat_res_p=1
    indptr_mat_=0

    for row_idx in range(shape_mat[0]):
        mat_row_ind=indices_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]
        mat_row_data=data_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]

        mat_ptr=0
        vec_ptr=0
        while mat_ptr<mat_row_ind.shape[0] and vec_ptr<vec_row_ind.shape[0]:
            ind_mat=mat_row_ind[mat_ptr]
            ind_vec=vec_row_ind[vec_ptr]

            #value for both matrix and vector is present
            if ind_mat==ind_vec:
                data_res[data_res_p]=max(mat_row_data[mat_ptr],vec_row_data[vec_ptr])
                indices_res[data_res_p]=ind_mat
                data_res_p+=1
                mat_ptr+=1
                vec_ptr+=1
                indptr_mat_+=1

            #only value for the matrix is present vector is assumed 0
            elif ind_mat<ind_vec:
                if mat_row_data[mat_ptr] >0:
                    data_res[data_res_p]=mat_row_data[mat_ptr]
                    indices_res[data_res_p]=ind_mat
                    data_res_p+=1
                    indptr_mat_+=1
                mat_ptr+=1

            #only value for the vector is present matrix is assumed 0
            else:
                if vec_row_data[vec_ptr] >0:
                    data_res[data_res_p]=vec_row_data[vec_ptr]
                    indices_res[data_res_p]=ind_vec
                    data_res_p+=1
                    indptr_mat_+=1
                vec_ptr+=1

        for i in range(mat_ptr,mat_row_ind.shape[0]):
            if mat_row_data[i] >0:
                data_res[data_res_p]=mat_row_data[i]
                indices_res[data_res_p]=mat_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        for i in range(vec_ptr,vec_row_ind.shape[0]):
            if vec_row_data[i] >0:
                data_res[data_res_p]=vec_row_data[i]
                indices_res[data_res_p]=vec_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        indptr_mat_res[indptr_mat_res_p]=indptr_mat_
        indptr_mat_res_p+=1

    if shrink_to_fit==True:
        data_res=np.copy(data_res[:data_res_p])
        indices_res=np.copy(indices_res[:data_res_p])
    else:
        data_res=data_res[:data_res_p]
        indices_res=indices_res[:data_res_p]

    return data_res,indices_res,indptr_mat_res

# get all needed components of the csr object and create a resulting csr object at the end
def sparse_elementwise_maximum_wrap(mat,vec,shrink_to_fit=True):
    mat_csr=mat.tocsr()
    vec_csr=vec.tocsr()

    shape_mat=mat_csr.shape
    indices_mat=mat_csr.indices
    indptr_mat=mat_csr.indptr
    data_mat=mat_csr.data
    indices_vec=vec_csr.indices
    data_vec=vec_csr.data

    res=sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,indices_vec,data_vec,shrink_to_fit)
    res=sparse.csr_matrix(res, shape=shape_mat)
    return res

Timings时间安排

Numba has a compilation overhead or some overhead to load the function from cache. Numba 有编译开销或一些开销来从缓存加载函数。 Don't consider the first call if you want to get the runtime and not compilation+runtime.如果要获取运行时而不是编译+运行时，请不要考虑第一次调用。

import numpy as np
from scipy import sparse

mat = sparse.csr_matrix(  sparse.random(20000, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )
vec = sparse.csr_matrix(  sparse.random(1, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )

%timeit output=sparse_elementwise_maximum(mat, vec)
#for csc input
37.9 s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#for csr input
10.7 s ± 90.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Daniel F
%timeit sparse_maximum(mat, vec)
164 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#low level implementation (first try)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
89.7 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#low level implementation (optimized, csr)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
16.5 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#low level implementation (preallocation, without copying at the end)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
16.5 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#low level implementation (preallocation, with copying at the end)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
16.5 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit res=sparse_elementwise_maximum_wrap(mat,vec,shrink_to_fit=False)
14.9 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec,shrink_to_fit=True)
21.7 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

#For comparison, copying the result takes
%%timeit
np.copy(res.data)
np.copy(res.indices)
np.copy(res.indptr)
7.8 ms ± 47.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 2

scipy.sparse matrices don't broadcast. scipy.sparse矩阵不广播。 At all.在所有。 So unless you can figure out some way to operate on the indices and inpts (I haven't), you're stuck stacking.因此，除非您能找出某种方法来操作indices和inpts （我还没有），否则您将陷入堆叠。 Best I can figure out is just to vstack your vec s until they're the same shape as mat .我能vstack最好的方法就是vstack你的vec直到它们的形状与mat相同。 It seems to give a good speedup, although it doesn't explain the segfault weirdness with csr .它似乎提供了很好的加速，尽管它不能解释csr怪异性。

#using `mat` and `vec` from the speed test
def sparse_maximum(mat, vec):
    vec1 = sparse.vstack([vec for _ in range(mat.shape[0])])
    return mat.maximum(vec1)

# Time it
num_timing_loops = 3.0
starttime = timeit.default_timer()
sparse_maximum(mat, vec)
print('time per call is:', (timeit.default_timer() - starttime)/num_timing_loops, 'seconds')
# I was getting 11-12 seconds on your original code
time per call is: 0.514533479333295 seconds

Proof that it works on original matrices:证明它适用于原始矩阵：

vec = sparse.vstack([vec for _ in range(4)])

print(mat.maximum(vec).todense())
[[  0   5 100]
 [  3   5 100]
 [  6   7 100]
 [  9  10 100]]

Answer 3

Looking at the maximum method, and code, especially the _binopt method in /scipy/sparse/compressed.py it's apparent that it can work with a scalar other .纵观maximum方法和代码，尤其是_binopt方法/scipy/sparse/compressed.py这是显而易见的是，它可以与标工作other 。 For a sparse other it constructs a new sparse matrix (of the same format and shape) using indptr , etc values.对于other稀疏矩阵，它使用indptr构造一个新的稀疏矩阵（具有相同的格式和形状）。 If other has the same shape, it works:如果 other 具有相同的形状，则它有效：

In [55]: mat = sparse.csr_matrix(np.arange(12).reshape((4,3)))
In [64]: mat.maximum(mat)
Out[64]: 
<4x3 sparse matrix of type '<class 'numpy.int64'>'
    with 11 stored elements in Compressed Sparse Row format>

It fails is the other is a 1d sparse matrix:它失败的是另一个是一维稀疏矩阵：

In [65]: mat.maximum(mat[0,:])
Segmentation fault (core dumped)

mat.maximum(mat[:,0]) runs without error, though I'm not sure about the values. mat.maximum(mat[:,0])运行没有错误，但我不确定这些值。 mat[:,0] will have the same size indptr . mat[:,0]将具有相同的大小indptr 。

I thought the mat.maximum(mat[:,0]) would give same fault if mat was csc , but it doesn't.我认为mat.maximum(mat[:,0])会给出同样的错误，如果mat是csc ，但事实并非如此。

Let's be honest, this kind of operation is not a strong point for sparse matrices.老实说，这种操作对于稀疏矩阵不是强项。 The core of its math is matrix multiplication.它的数学核心是矩阵乘法。 That's what they were originally developed for - sparse linear algebra problems such as finite difference and finite element.这就是它们最初开发的目的 - 稀疏线性代数问题，例如有限差分和有限元。

具有广播的稀疏 Scipy 矩阵和向量的元素最大值

问题描述

Larger example & current solution for speed testing用于速度测试的更大示例和当前解决方案

3 个解决方案

解决方案1
5 已采纳 2020-11-19 21:49:10

A low level approach低级方法

解决方案2
3 2020-11-17 12:51:23

解决方案3
1 2020-11-17 17:12:32

具有广播的稀疏 Scipy 矩阵和向量的元素最大值

问题描述

Larger example & current solution for speed testing用于速度测试的更大示例和当前解决方案

3 个解决方案

解决方案1 5 已采纳 2020-11-19 21:49:10

A low level approach低级方法

解决方案2 3 2020-11-17 12:51:23

解决方案3 1 2020-11-17 17:12:32

解决方案1
5 已采纳 2020-11-19 21:49:10

解决方案2
3 2020-11-17 12:51:23

解决方案3
1 2020-11-17 17:12:32