简体   繁体   中英

Efficient slicing of matrices using matrix multiplication, with Python, NumPy, SciPy

I want to reshape a 2d scipy.sparse.csr.csr_matrix (let us call it A ) to a 2d numpy.ndarray (let us call this B ).

A could be

>shape(A)
(90, 10)

then B should be

>shape(B)
(9,10)

where each 10 rows of A would be reshaped in a new new value, namely the maximum of this window and column. The column operator is not working on this unhashable type of a sparse matrix. How can I get this B by using matrix multiplications?

Using matrix multiplication you can do an efficient slicing creating a "slicer" matrix with ones at the right places. The sliced matrix will have the same type as the "slicer", so you can control in an efficient way your output type.

Below you will see some comparisons and the most efficient for you case is to ask for the .A matrix and slice it. It showed to be much faster than the .toarray() method. Using multiplication is the second fastest option when the "slicer" is created as a ndarray , multiplied by the csr matrix and slice the result .

OBS: using a coo sparse for matrix A resulted in a slightly slower timing, keeping the same proportions, and sol3 is not applicable, I realized later that in the multiplication it is converted to a csr automatically.

import scipy
import scipy.sparse.csr as csr
test = csr.csr_matrix([
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29],
[31,32,33,34,35,36,37,38,39],
[41,42,43,44,45,46,47,48,49],
[51,52,53,54,55,56,57,58,59],
[61,62,63,64,65,66,67,68,69],
[71,72,73,74,75,76,77,78,79],
[81,82,83,84,85,86,88,88,89],
[91,92,93,94,95,96,99,98,99]])

def sol1():
    B = test.A[2:5]

def sol2():
    slicer = scipy.array([[0,0,0,0,0,0,0,0,0],
                          [0,0,0,0,0,0,0,0,0],
                          [0,0,1,0,0,0,0,0,0],
                          [0,0,0,1,0,0,0,0,0],
                          [0,0,0,0,1,0,0,0,0]])
    B = (slicer*test)[2:]
    return B

def sol3():
    B = (test[2:5]).A
    return B

def sol4():
    slicer = csr.csr_matrix( ((1,1,1),((2,3,4),(2,3,4))), shape=(5,9) )
    B = ((slicer*test).A)[2:] # just changing when we do the slicing
    return B

def sol5():
    slicer = csr.csr_matrix( ((1,1,1),((2,3,4),(2,3,4))), shape=(5,9) )
    B = ((slicer*test)[2:]).A
    return B


timeit sol1()
#10000 loops, best of 3: 60.4 us per loop

timeit sol2()
#10000 loops, best of 3: 91.4 us per loop

timeit sol3()
#10000 loops, best of 3: 111 us per loop

timeit sol4()
#1000 loops, best of 3: 310 us per loop

timeit sol5()
#1000 loops, best of 3: 363 us per loop

EDIT: the answer has been updated replacing .toarray() by .A , giving much faster results and now the best solutions are placed on top

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM