简体   繁体   中英

dense matrix vs sparse matrix in python

I'm comparing in python the reading time of a row of a matrix, taken first in dense and then in sparse format.

The "extraction" of a row from a dense matrix costs around 3.6e-05 seconds

For the sparse format I tried both csr_mtrix and lil_matrix, but both for the row-reading spent around 1-e04 seconds

I would expect the sparse format to give the best performance, can anyone help me understand this ?

arr[i,:] for a dense array produces a view , so its execution time is independent of arr.shape . If you don't understand the distinction between view and copy , you need to do more reading about numpy basics.

csr and lil formats allow indexing that looks a lot like ndarray's , but there are key differences. For the most part the concept of a view does not apply. There is one exception. M.getrowview(i) takes advantage of the unique data structure of a lil to produce a view . (Read its doc and code)

Some indexing of a csr format actually uses matrix multiplication, using a specially constructed 'extractor' matrix.

In all cases where sparse indexing produces sparse matrix, actually constructing the new matrix from the data takes time. Sparse does not use compiled code nearly as much as numpy . It's strong point, relative to numpy is matrix multiplication of matrices that are 10% sparse (or smaller).

In the simplest format (to understand), coo , each nonzero element is represented by 3 values - data, row, col. Those are stored in 3 1d arrays. So it has to have a sparsity of less than 30% to even break even with respect to memory use. coo does not implement indexing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM