简体   繁体   English

CUDA CSR Matrix-Matrix产品自行转置

[英]CUDA CSR Matrix-Matrix product transpose by itself

I have a very large, very sparse least-squares design matrix (A), which I would like to multiply by itself, as follows: N = A^T * A, where A & N are stored in CSR format. 我有一个非常大,非常稀疏的最小二乘设计矩阵(A),我想将其自身相乘,如下所示:N = A ^ T * A,其中A和N以CSR格式存储。 Obviously, A has more rows than columns. 显然,A的行多于列。 I normally form N directly row-by-row, but in the case of CSR, I would have to form a graph first, in order to determine which elements of N are non-zero. 我通常直接逐行形成N,但是对于CSR,我必须首先形成一个图,以确定N的哪些元素不为零。 I could do this (and even have some old c-code), but I am hoping to get to a solution with less development. 我可以做到这一点(甚至有一些旧的C代码),但我希望能得到开发较少的解决方案。 I am using CUDA, so this could be done on either the GPU or CPU, where I could see advantages of using the GPU. 我正在使用CUDA,因此可以在GPU或CPU上完成,在那里我可以看到使用GPU的优势。 I have sketched out an algorithm, but was hoping that this problem had already been solved. 我已经草拟了一个算法,但是希望这个问题已经解决。 I could not find anything in the CUDA toolkit, other than the direct A * x = l QR solver (where A=(m,n)). 除了直接A * x = l QR解算器(其中A =(m,n))之外,我在CUDA工具包中找不到任何东西。 Google was also not very helpful. Google也不是很有帮助。

I am using C++. 我正在使用C ++。

Does anyone have any experience here? 有人在这里有经验吗?

Ordering of a general COO sparse matrix into CSR/CSC format , and specifically transposition / conversion between CSR and CSC formats are relatively cheap operations and readily available in the cuSPARSE library . 普通的COO稀疏矩阵排序为CSR / CSC格式 ,特别是CSR和CSC格式之间的转置/转换是相对便宜的操作,并且可以在cuSPARSE库中轻松获得。

After conversion of your matrix A from CSR format to CSC , you can readily apply the trivial algorithm to compute N = A^T * A. 将矩阵A从CSR格式转换为CSC之后 ,您可以轻松地应用平凡的算法来计算N = A ^ T *A。
This can also easily be parallelised with CUDA by having each thread process one column of A to generate one output. 通过使每个线程处理A的一列以生成一个输出,这也可以很容易地与CUDA并行化。

just noticed that cuSparse in the CUDA toolkit actually has a csr-gemm, which supports transpose on either matrix. 刚刚注意到CUDA工具包中的cuSparse实际上有一个csr-gemm,它支持在任一矩阵上转置。 I don't know how I overlooked this. 我不知道我怎么忽略了这一点。 See https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrgemm . 请参阅https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrgemm Looks like the simplest solution... 看起来是最简单的解决方案...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM