简体繁体 English

CUDA CSR Matrix-Matrix产品自行转置

[英]CUDA CSR Matrix-Matrix product transpose by itself

原文 2019-03-10 11:14:35 9 2 c++/ cuda/ sparse-matrix/ blas/ csr

I have a very large, very sparse least-squares design matrix (A), which I would like to multiply by itself, as follows: N = A^T * A, where A & N are stored in CSR format. 我有一个非常大，非常稀疏的最小二乘设计矩阵（A），我想将其自身相乘，如下所示：N = A ^ T * A，其中A和N以CSR格式存储。 Obviously, A has more rows than columns. 显然，A的行多于列。 I normally form N directly row-by-row, but in the case of CSR, I would have to form a graph first, in order to determine which elements of N are non-zero. 我通常直接逐行形成N，但是对于CSR，我必须首先形成一个图，以确定N的哪些元素不为零。 I could do this (and even have some old c-code), but I am hoping to get to a solution with less development. 我可以做到这一点（甚至有一些旧的C代码），但我希望能得到开发较少的解决方案。 I am using CUDA, so this could be done on either the GPU or CPU, where I could see advantages of using the GPU. 我正在使用CUDA，因此可以在GPU或CPU上完成，在那里我可以看到使用GPU的优势。 I have sketched out an algorithm, but was hoping that this problem had already been solved. 我已经草拟了一个算法，但是希望这个问题已经解决。 I could not find anything in the CUDA toolkit, other than the direct A * x = l QR solver (where A=(m,n)). 除了直接A * x = l QR解算器（其中A =（m，n））之外，我在CUDA工具包中找不到任何东西。 Google was also not very helpful. Google也不是很有帮助。

I am using C++. 我正在使用C ++。

Does anyone have any experience here? 有人在这里有经验吗？

2 个解决方案

Ordering of a general COO sparse matrix into CSR/CSC format , and specifically transposition / conversion between CSR and CSC formats are relatively cheap operations and readily available in the cuSPARSE library . 将普通的COO稀疏矩阵排序为CSR / CSC格式，特别是CSR和CSC格式之间的转置/转换是相对便宜的操作，并且可以在cuSPARSE库中轻松获得。

After conversion of your matrix A from CSR format to CSC , you can readily apply the trivial algorithm to compute N = A^T * A. 将矩阵A从CSR格式转换为CSC之后，您可以轻松地应用平凡的算法来计算N = A ^ T *A。
This can also easily be parallelised with CUDA by having each thread process one column of A to generate one output. 通过使每个线程处理A的一列以生成一个输出，这也可以很容易地与CUDA并行化。

just noticed that cuSparse in the CUDA toolkit actually has a csr-gemm, which supports transpose on either matrix. 刚刚注意到CUDA工具包中的cuSparse实际上有一个csr-gemm，它支持在任一矩阵上转置。 I don't know how I overlooked this. 我不知道我怎么忽略了这一点。 See https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrgemm . 请参阅https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrgemm 。 Looks like the simplest solution... 看起来是最简单的解决方案...