如何在CUDA中复制更大矩阵中的矩阵

Question

I want to setup a big matrix on my GPU to solve the according equation system with CULA. 我想在我的GPU上设置一个大矩阵，以使用CULA解决相应的方程组。

Some numbers for you, to understand the problem: 一些数字供您了解问题：

big matrix:     400x400
small matrices: 200x200

Now I want to copy every quarter (100x100) of the small matrix to a specific part of the second matrix. 现在，我想将小矩阵的每四分之一(100x100)复制到第二个矩阵的特定部分。

I found two possible but obviously slow examples: cublasSetMatrix and cublasGetMatrix support the specification of a leading dimension, so I could put the parts, where I want them, but have to copy the matrix back to the host. 我发现了两个可能但显然很慢的示例： cublasSetMatrix和cublasGetMatrix支持领先维度的规范，因此我可以将零件放在所需的位置，但必须将矩阵复制回主机。 The other example would be cudaMemcpy , which doesn't support leading dimensions. 另一个示例是cudaMemcpy ，它不支持前置尺寸。 Here I could copy every single row/column (at the moment I am unsure what is used by this routine, data comes from Fortran) by hand. 在这里，我可以手动复制每行/每列（目前我不确定该例程使用的是什么，数据来自Fortran）。 But this way, I should get a big overhead... 但是这样，我应该得到很大的开销...

Is there a better way than writing my own kernel, to copy the matrix? 有没有比编写自己的内核更好的方法来复制矩阵了？

Answer 1

You may revise your Q. I guess you are finding a way that can both change the leading dimension and do D2Dcpy. 您可能会修改您的Q。我想您正在寻找一种既可以更改领先维度又可以执行D2Dcpy的方法。

There is a routine cudaMemcpy2D() can do that as shown in here . 如此处所示，有一个例程cudaMemcpy2D()可以做到这一点。

如何在CUDA中复制更大矩阵中的矩阵

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-05-02 09:52:00

如何在CUDA中复制更大矩阵中的矩阵

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-05-02 09:52:00

解决方案1
0 已采纳 2013-05-02 09:52:00