[英]How to implement an interface to a sub-matrix in CUDA?
I have a wrapper class CudaMatrix
that implements several cuBLAS operations, allowing me to call m1.multiply(m2)
that runs the sgemm
operation on the internal data pointers. 我有一个包装器类CudaMatrix
,它实现了多个cuBLAS操作,使我可以调用m1.multiply(m2)
,该sgemm
m1.multiply(m2)
在内部数据指针上运行sgemm
操作。
I would like to extend the class by operations on sub-matrices, something like 我想通过对子矩阵的操作来扩展类,例如
CudaMatrix a(100,100);
CudaMatrix b(100,100);
// fill a and b
int i=5, j=15;
CudaSubMatrix sa(a, i, j, i+10, j+10); // sa := a[5:15, 15:25]
i=50, j=60;
CudaSubMatrix sb(b, i, j, i+10, j+10); // sb := b[50:60, 60:70]
CudaMatrix res;
res.copy(sa);
res.multiply(sb) // res = sa*sb
In the last row, multiply()
needs to operate on a sub-matrix sb
, so the rows are not contiguous and I can't call the same sgemm
operations as before. 在最后一行中, multiply()
需要在子矩阵sb
上进行操作,因此这些行不是连续的,并且我无法调用与之前相同的sgemm
操作。
How do I implement an efficient interface to sub-matrices that avoids copying data explicitly? 如何为子矩阵实现有效的接口,从而避免显式复制数据? Are there any open-source implementations that I can look for? 我可以寻找任何开源实现吗?
The sub-matrices multiply may be performed using the ldx parameter of the API calls. 可以使用API调用的ldx参数执行子矩阵乘法。
Indexing is described at the 1.1 DataLayout section: 索引在1.1 DataLayout部分中介绍:
#define IDX2C(i,j,ld) (((j)*(ld))+(i)) #定义IDX2C(i,j,ld)(((j)*(ld))+(i))
Then use the cublasSgemm for example with lda
parameter equal to the number of lines 然后使用cublasSgemm ,例如lda
参数等于行数
the cuBLAS library uses column-major storage cuBLAS库使用列主存储
of the original matrix, and m
, n
, k
for the sub-matrices. 表示原始矩阵, m
, n
, k
表示子矩阵。
Note indexing might differ in fortran for C indexing scheme. 注意,在fortran for C索引方案中,索引编制可能有所不同。
Hence what you really need is the size of your sub-matrix (col,rows), and the size of a column in the input matrix (its number of lines). 因此,您真正需要的是子矩阵的大小(col,行)以及输入矩阵中的列的大小(其行数)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.