简体繁体中英

Solving n linear systems efficiently

原文 2019-03-23 23:57:41 7 1 cuda/ gpu/ linear-algebra/ equation-solving/ linear-equation

I have n (very large) independent linear systems (Ax = b_i). They all have the same A, but b_i is different for (i = 1, ..., n). I want to solve these n systems in parallel in CUDA.

I was thinking that it might be most efficient to do the LU factorization of A in the host and then copy the new A to the constant memory of GPU (because even if I do the LU in device, only one thread should do it and other threads will be idle. Besides, constant memory is faster). Is there any better way for this?

Another issue is that while all threads are solving their system at the same time with the same algorithm, they are all accessing the same place of memory (A[i]) at the same time, which is not coalesced. How can I optimize this ?

1 answers

(This is assuming A is an stably -invertible nxn matrix.)

Don't solve a much harder problem just because it seems to parallelize better

Let B be the matrix whose columns are b_1 ... b_n. Under our assumptions about A, you actually need to solve the equation AX = B for an nxn matrix of variables, ie your solution is A^{-1}B.

So basically you have one matrix inversion and one matrix multiplication. This holds regardless of what software and hardware you're going to use. For inversion and multiplication just use CUBLAS, or cuSparse, or cuSOLVER, or ArrayFire or whatever solves these things the fastest.

_{You could do both of them together I suppose, but I'm not sure there are optimizations for that).}

Solving tridiagonal linear systems in CUDA

Solving dense linear systems AX = B with CUDA

Solving sparse definite positive linear systems in CUDA

Solving general sparse linear systems in CUDA

CUDA - CUBLAS: issues solving many (3x3) dense linear systems

Solving sparse linear systems in CUDA using LU factorization

QR decomposition to solve linear systems in CUDA

Solving linear system using Python with numba and CUDA

CUDA to solve many “small/moderate” linear systems

Contradiction of cublasDgetrfBatched and cublasDtrsmBatched when to solve array of linear systems using cuBLAS

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Solving tridiagonal linear systems in CUDA Solving dense linear systems AX = B with CUDA Solving sparse definite positive linear systems in CUDA Solving general sparse linear systems in CUDA CUDA - CUBLAS: issues solving many (3x3) dense linear systems Solving sparse linear systems in CUDA using LU factorization QR decomposition to solve linear systems in CUDA Solving linear system using Python with numba and CUDA CUDA to solve many “small/moderate” linear systems Contradiction of cublasDgetrfBatched and cublasDtrsmBatched when to solve array of linear systems using cuBLAS

Related Tags

Solving n linear systems efficiently

Question

1 answers

solution1 -3 2019-03-24 00:09:51

Don't solve a much harder problem just because it seems to parallelize better

solution1
-3 2019-03-24 00:09:51