简体繁体中英

Sparse Matrix Vs Dense Matrix Multiplication C++ Tensorflow

原文 2020-07-28 05:32:16 7 1 c++/ sparse-matrix/ matrix-multiplication

I would like to write in C++ Tensorflow sparse matrix dense vector (SPMv) multiplication: y = Ax

The sparse matrix, A, is stored in CSR format. The usual sparsity of A is between 50-90%. The goal is to reach better or similar time than that of dense matrix dense vector (DMv) multiplication.

Please note that I have already viewed the following posts: Q1 Q2 Q3 . However, I still am wondering about the following:

How does SPMv multiplication compare in terms of time to DMv? Since sparsity is relatively high, I assume that SPMv should be better given the reduction in the number of operations - Yes?
What should I take into to account to make SpMv the same or better in terms of time than the DMv? Why ppl are saying that the DMv will perform petter than SPMv? Does the storage representation make a difference?
Any recommended libraries that do SPMv in C++ for either CPU or GPU implementation.

This question is relevant to my other question here: ( CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network )

1 answers

To answer the edited question:

Unless the Matrix is very sparse (<10% nonzeros on CPU, probably <1% on GPU), you will likely not benefit from the sparsity. While the number of floating point operations is reduced, the amount of storage is at least double (column or row index + value), memory access is irregular (you have an indirection via the index for the right-hand side), it becomes far more difficult to vectorize (or to achieve coalescing on the GPU) and if you parallelize you have to deal with the fact that rows are of varying length and therefore a static schedule is likely to be suboptimal.
Beyond the points above, yes, the storage representation matters. For example a COO-matrix stores two indices and the value, while CSR/CSC only store one but require an additional offset array which makes them more complex to build on the fly. Especially on the GPU, storage formats matter if you want to at least achieve some coalescing. This paper looks into how storage formats affect performance on the GPU: https://onlinelibrary.wiley.com/doi/full/10.1111/cgf.13957
For something generic try Eigen or cuSparse on GPU. There are plenty of others that perform better for specific use cases, but this part of the question isn't clearly answerable.

Beyond the matrix format itself, even the ordering of entries in your matrix can have a massive impact on performance, which is why the Cuthill-McKee algorithm is often used to reduce matrix bandwidth (and thereby improve cache performance).

Sparse matrix-dense vector multiplication with matrix known at compile time

Sparse Matrix multiplication like (maxmin) in C++ using Octave libraries

sparse x dense matrix multiplication performance under-efficient

Using Intel oneAPI MKL to perform sparse matrix with dense vector multiplication

Matrix Multiplication C++

C++ matrix multiplication

matrix multiplication in C++

Fast matrix multiplication on sparse matrices of 1's in C

Fast sparse matrix multiplication

C++ Eigen Sparse Matrix multiplication much slower than python scipy.sparse

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Sparse matrix-dense vector multiplication with matrix known at compile time Sparse Matrix multiplication like (maxmin) in C++ using Octave libraries sparse x dense matrix multiplication performance under-efficient Using Intel oneAPI MKL to perform sparse matrix with dense vector multiplication Matrix Multiplication C++ C++ matrix multiplication matrix multiplication in C++ Fast matrix multiplication on sparse matrices of 1's in C Fast sparse matrix multiplication C++ Eigen Sparse Matrix multiplication much slower than python scipy.sparse

Related Tags

Sparse Matrix Vs Dense Matrix Multiplication C++ Tensorflow

Question

1 answers

solution1 2 2020-07-28 08:06:14

solution1
2 2020-07-28 08:06:14