简体   繁体   中英

Use one LU factorization in several instances of mkl_dss_solve

I am using Intel MKL library to solve a system of linear equations (A*x = b) with multiple right-hand side (rhs) vectors. The rhs vectors are generated asynchronously and through a separate routine and therefore, it is not possible to solve them all at once.

In order to expedite the program, a multi-threaded program is used where each thread is responsible for solving a single rhs vectors. Since the matrix A is always constant, LU factorization should be performed once and the factors are used subsequently in all threads. So, I factor A using following command

dss_factor_real(handle, opt, data);

and pass the handle to the threads to solve the problems using following command:

dss_solve_real(handle, opt, rhs, nRhs, sol);

However, I found out that it is not thread-safe to use the same handle in several instances of dss_solve_real . Apparently, for some reason, MKL library changes handle in each instance which creates race condition. I read the MKL manual but could not find anything relevant. Since it is not logical to factorize A for each thread, I am wondering if there is any way to overcome this problem and use the same handle everywhere.

Thanks in advance for your help

As far as I understand the DSS interface, handle does not contains only the LU factorization, but also other data structures, used and modified in dss_solve_real ; this is by design, so you should use a locking mechanism to avoid multiple threads calling dss_solve_real concurrently on the same handle .

Moreover your assumption that dss_solve_real is serial (otherwise I do not understand why should you call multiple instances of it concurrently) is probably wrong. DSS is an interface to the PARDISO solver, which should be parallel in all of it's phases, not only factorization.

Edit

Abandoning the DSS interface and calling directly pardiso, it should be possible to have many threads serially solving a single rhs each. (Not easy, but with careful programming it should be possible...)

However from the point of view of maximum throughput (rhs solved per unit of time) and not minimum latency (time before the solution of a single rhs is started) I think that the best approach is to have a single working thread that solves all rhs waiting in the queue with a single call to the parallel solver. Of course the queue should be organized so that rhs vectors are stored in a contigous memory area.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM