简体   繁体   English

如何在CUDA中复制更大矩阵中的矩阵

[英]How to copy a matrix in a bigger matrix in CUDA

I want to setup a big matrix on my GPU to solve the according equation system with CULA. 我想在我的GPU上设置一个大矩阵,以使用CULA解决相应的方程组。

Some numbers for you, to understand the problem: 一些数字供您了解问题:

big matrix:     400x400
small matrices: 200x200

Now I want to copy every quarter (100x100) of the small matrix to a specific part of the second matrix. 现在,我想将小矩阵的每四分之一(100x100)复制到第二个矩阵的特定部分。

I found two possible but obviously slow examples: cublasSetMatrix and cublasGetMatrix support the specification of a leading dimension, so I could put the parts, where I want them, but have to copy the matrix back to the host. 我发现了两个可能但显然很慢的示例: cublasSetMatrixcublasGetMatrix支持领先维度的规范,因此我可以将零件放在所需的位置,但必须将矩阵复制回主机。 The other example would be cudaMemcpy , which doesn't support leading dimensions. 另一个示例是cudaMemcpy ,它不支持前置尺寸。 Here I could copy every single row/column (at the moment I am unsure what is used by this routine, data comes from Fortran) by hand. 在这里,我可以手动复制每行/每列(目前我不确定该例程使用的是什么,数据来自Fortran)。 But this way, I should get a big overhead... 但是这样,我应该得到很大的开销...

Is there a better way than writing my own kernel, to copy the matrix? 有没有比编写自己的内核更好的方法来复制矩阵了?

You may revise your Q. I guess you are finding a way that can both change the leading dimension and do D2Dcpy. 您可能会修改您的Q。我想您正在寻找一种既可以更改领先维度又可以执行D2Dcpy的方法。

There is a routine cudaMemcpy2D() can do that as shown in here . 如此处所示,有一个例程cudaMemcpy2D()可以做到这一点

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM