[英]CUDA CSR Matrix-Matrix product transpose by itself
I have a very large, very sparse least-squares design matrix (A), which I would like to multiply by itself, as follows: N = A^T * A, where A & N are stored in CSR format. 我有一个非常大,非常稀疏的最小二乘设计矩阵(A),我想将其自身相乘,如下所示:N = A ^ T * A,其中A和N以CSR格式存储。 Obviously, A has more rows than columns.
显然,A的行多于列。 I normally form N directly row-by-row, but in the case of CSR, I would have to form a graph first, in order to determine which elements of N are non-zero.
我通常直接逐行形成N,但是对于CSR,我必须首先形成一个图,以确定N的哪些元素不为零。 I could do this (and even have some old c-code), but I am hoping to get to a solution with less development.
我可以做到这一点(甚至有一些旧的C代码),但我希望能得到开发较少的解决方案。 I am using CUDA, so this could be done on either the GPU or CPU, where I could see advantages of using the GPU.
我正在使用CUDA,因此可以在GPU或CPU上完成,在那里我可以看到使用GPU的优势。 I have sketched out an algorithm, but was hoping that this problem had already been solved.
我已经草拟了一个算法,但是希望这个问题已经解决。 I could not find anything in the CUDA toolkit, other than the direct A * x = l QR solver (where A=(m,n)).
除了直接A * x = l QR解算器(其中A =(m,n))之外,我在CUDA工具包中找不到任何东西。 Google was also not very helpful.
Google也不是很有帮助。
I am using C++. 我正在使用C ++。
Does anyone have any experience here? 有人在这里有经验吗?
Ordering of a general COO sparse matrix into CSR/CSC format , and specifically transposition / conversion between CSR and CSC formats are relatively cheap operations and readily available in the cuSPARSE library . 将普通的COO稀疏矩阵排序为CSR / CSC格式 ,特别是CSR和CSC格式之间的转置/转换是相对便宜的操作,并且可以在cuSPARSE库中轻松获得。
After conversion of your matrix A from CSR format to CSC , you can readily apply the trivial algorithm to compute N = A^T * A. 将矩阵A从CSR格式转换为CSC之后 ,您可以轻松地应用平凡的算法来计算N = A ^ T *A。
This can also easily be parallelised with CUDA by having each thread process one column of A to generate one output. 通过使每个线程处理A的一列以生成一个输出,这也可以很容易地与CUDA并行化。
just noticed that cuSparse in the CUDA toolkit actually has a csr-gemm, which supports transpose on either matrix. 刚刚注意到CUDA工具包中的cuSparse实际上有一个csr-gemm,它支持在任一矩阵上转置。 I don't know how I overlooked this.
我不知道我怎么忽略了这一点。 See https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrgemm .
请参阅https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrgemm 。 Looks like the simplest solution...
看起来是最简单的解决方案...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.