![](/img/trans.png)
[英]Extract upper triangular matrix from the result of “magma_dgeqrf2_gpu” (which is a general matrix) without transferring to host
[英]How to use MAGMA with NVIDIA GPU card instead of CPU LAPACKE to inverse large matrix
我需要對大型矩陣求逆,我想修改我當前的 LAPACKE 版本例程,以便利用 GPU NVIDIA 卡的強大功能。
事實上,我的 LAPACKE 例程適用於相對較小的矩陣,但不適用於大型矩陣。
下面是這個 LAPACKE 例程的實現:
#include <mkl.h>
// Passing Matrixes by Reference
void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
// Index for loop and arrays
int i, j, ip, idx;
// Size of F_matrix
int N = F_matrix.size();
int *IPIV = new int[N];
// Output Diagonal block
double *diag = new double[N];
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
arr[idx] = F_matrix[i][j];
}
}
// LAPACKE routines
int info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
int info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
F_output[i][j] = arr[idx];
}
}
delete[] IPIV;
delete[] arr;
}
像這樣調用 with 來逆 CO_CL 矩陣:
matrix_inverse_lapack(CO_CL, CO_CL);
CO_CL 定義為:
vector<vector<double>> CO_CL(lsize*(2*Dim_x+Dim_y), vector<double>(lsize*(2*Dim_x+Dim_y), 0));
在我的案例中,如何使用 NVIDIA 的 MAGMA 而不是使用 LAPACKE 來求逆矩陣?
更新 1:我已經下載了magma-2.6.1
,首先,我必須修改原來的 Makefile:
CXX = icpc -std=c++11 -O3 -xHost
CXXFLAGS = -Wall -c -I${MKLROOT}/include -I/opt/intel/oneapi/compiler/latest/linux/compiler/include -qopenmp -qmkl=parallel
LDFLAGS = -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/../compiler/lib -qopenmp -qmkl
SOURCES = main_intel.cpp XSAF_C_intel.cpp
EXECUTABLE = main_intel.exe
我沒有在magma-2.6.1
中看到mkl
標頭: nvcc
和MKL
兼容嗎?
嘗試使用magma sgetri gpu
- 單精度逆矩陣,GPU 接口。 這個 function 以單精度計算 m × m 矩陣 A 的逆 A^−1。
magma_ssetmatrix ( m, m, a,m, d_a ,m, queue ); // copy a -> d_a
magmablas_slacpy ( MagmaFull ,m,m,d_a ,m,d_r ,m, queue ); // d_a - >d_r
// find the inverse matrix : d_a *X=I using the LU factorization
// with partial pivoting and row interchanges computed by
// magma_sgetrf_gpu ; row i is interchanged with row piv (i);
// d_a -mxm matrix ; d_a is overwritten by the inverse
gpu_time = magma_sync_wtime ( NULL );
magma sgetrf gpu( m, m, d a, m, piv, &info);
magma sgetri gpu(m,d a,m,piv,dwork,ldwork,&info);
NVIDIA的官方文檔中有很多例子,你也可以看看:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.