[英]How to copy a matrix in a bigger matrix in CUDA
I want to setup a big matrix on my GPU to solve the according equation system with CULA. 我想在我的GPU上设置一个大矩阵,以使用CULA解决相应的方程组。
Some numbers for you, to understand the problem: 一些数字供您了解问题:
big matrix: 400x400
small matrices: 200x200
Now I want to copy every quarter (100x100)
of the small matrix to a specific part of the second matrix. 现在,我想将小矩阵的每四分之一(100x100)
复制到第二个矩阵的特定部分。
I found two possible but obviously slow examples: cublasSetMatrix
and cublasGetMatrix
support the specification of a leading dimension, so I could put the parts, where I want them, but have to copy the matrix back to the host. 我发现了两个可能但显然很慢的示例: cublasSetMatrix
和cublasGetMatrix
支持领先维度的规范,因此我可以将零件放在所需的位置,但必须将矩阵复制回主机。 The other example would be cudaMemcpy
, which doesn't support leading dimensions. 另一个示例是cudaMemcpy
,它不支持前置尺寸。 Here I could copy every single row/column (at the moment I am unsure what is used by this routine, data comes from Fortran) by hand. 在这里,我可以手动复制每行/每列(目前我不确定该例程使用的是什么,数据来自Fortran)。 But this way, I should get a big overhead... 但是这样,我应该得到很大的开销...
Is there a better way than writing my own kernel, to copy the matrix? 有没有比编写自己的内核更好的方法来复制矩阵了?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.