通过引用CUDA指针进行CUDA矩阵求逆

Question

Currently I'm just trying to implement simple Linear Regression algorithm in matrix-form based on cuBLAS with CUDA. 目前，我只是试图基于带有CUDA的cuBLAS以矩阵形式实现简单的线性回归算法。 Matrix multiplication and transposition works well with cublasSgemm function. 矩阵乘法和转置可与cublasSgemm函数配合使用。

Problems begins with matrix inversions, based on cublas<t>getrfBatched() and cublas<t>getriBatched() functions (see here ). 问题始于基于cublas<t>getrfBatched()和cublas<t>getriBatched()函数的矩阵求逆（请参见此处）。

As it can be seen, input parameters of these functions - arrays of pointers to matrices. 可以看出，这些函数的输入参数-矩阵指针数组。 Imagine, that I've already allocated memory for (A^T * A) matrix on GPU as a result of previous calculations: 想象一下，由于先前的计算，我已经为GPU上的（A ^ T * A）矩阵分配了内存：

float* dProdATA;
cudaStat = cudaMalloc((void **)&dProdATA, n*n*sizeof(*dProdATA));

Is it possible to run factorization (inversion) 是否可以运行分解（反转）

cublasSgetrfBatched(handle, n, &dProdATA, lda, P, INFO, mybatch);

without additional HOST <-> GPU memory copying (see working example of inverting array of matrices ) and allocating arrays with single element, but just get GPU-reference to GPU-pointer? 不需要额外的HOST <-> GPU内存复制（请参见将矩阵求逆的工作示例），并且不使用单个元素分配数组，而只是获取GPU指向GPU指针的引用？

Answer 1

There is no way around the requirement that the array you pass being in the device address space, and what you posted in your question won't work. 无法绕过传递的数组位于设备地址空间中的要求，并且您在问题中张贴的内容将无法正常工作。 You really only have two possibilities: 您实际上只有两种可能性：

Allocate an array of pointers on the device and do the memory transfer (the solution you don't want to use). 在设备上分配一个指针数组，然后进行内存传输（您不想使用的解决方案）。
Use zero-copy or managed host memory to store the batch array 使用零拷贝或托管主机内存来存储批处理阵列

In the latter case with managed memory, something like this should work (completely untested, use at own risk): 在后一种情况下，如果使用托管内存，则应执行以下操作（完全未经测试，使用风险自负）：

float ** batch;
cudaMallocManaged((&batch, sizeof(float *));
*batch = dProdATA;
cublasSgetrfBatched(handle, n, batch, lda, P, INFO, mybatch);

通过引用CUDA指针进行CUDA矩阵求逆

问题描述

1 个解决方案

解决方案1
1 已采纳

通过引用CUDA指针进行CUDA矩阵求逆

问题描述

1 个解决方案

解决方案1 1 已采纳

解决方案1
1 已采纳