简体繁体中英

Using CUDA to solve a system of equations in non-linear least squares fashion

原文 2012-11-07 20:28:07 0 5 cuda/ gpu/ linear-algebra/ mathematical-optimization/ hessian-matrix

Using CUDA, I would like to solve a system of equations with a non-linear least squares solver. These methods are discussed in an excellent booklet that can be downloaded here .

The Jacobian matrix in my problem is sparse and lower triangular. Is there a library for CUDA available with these methods, or will I have to program these methods myself from the booklet?

Is a Gauss-Newton non-linear least squares solver, Levenberg-Marquardt or Powell's Method solver available in a CUDA library (either free or non-free)?

5 answers

Before pointing out a possible, simple implementation of a quasi-Newton optimization routine in CUDA, some words on how a quasi-Newton optimizer works.

Consider a function f of N real variables x and make a second order expansion around a certain point xi :

在此处输入图片说明

where A is the Hessian matrix.

To find a minimum starting from a point xi , Newton's method consists of forcing

在此处输入图片说明

which entails

在此处输入图片说明

and which, in turn, implies to know the inverse of the Hessian. Furthermore, to ensure the function decreases, the update direction

在此处输入图片说明

should be such that

在此处输入图片说明

which implies that

在此处输入图片说明

According to the above inequality, the Hessian matrix should be definite positive. Unfortunately, the Hessian matrix is not necessarily definite positive, especially far from a minimum of f , so using the inverse of the Hessian, besides being computationally burdened, can be also deleterious, pushing the procedure even farther from the minimum towards regions of increasing values of f . Generally speaking, it is more convenient to use a quasi-Newton method, ie, an approximation of the inverse of the Hessian, which keeps definite positive and updates iteration after iterations converging to the inverse of the Hessian itself. A rough justification of a quasi-Newton method is the following. Consider

在此处输入图片说明

and

在此处输入图片说明

Subtracting the two equations, we have the update rule for the Newton procedure

在此处输入图片说明

The updating rule for the quasi-Newton procedure is the following

在此处输入图片说明

where Hi+1 is the mentioned matrix approximating the inverse of the Hessian and updating step after step.

There are several rules for updating Hi+1 , and I'm not going into the details of this point. A very common one is provided by the Broyden-Fletcher-Goldfarb-Shanno , but in many cases the Polak-Ribiére scheme, is effective enough.

The CUDA implementation can follow the same steps of the classical Numerical Recipes approach, but taking into account that:

1) Vector and matrix operations can be effectively accomplished by CUDA Thrust or cuBLAS; 2) The control logic can be performed by the CPU; 3) Line minimization, involving roots bracketing and root findings, can be performed on the CPU, accelerating only the cost functional and gradient evaluations of the GPU.

By the above scheme, unknowns, gradients and Hessian can be kept on the device without any need to move them back and forth from host to device.

Please, note also that some approaches are available in the literature in which attempt to parallelize the line minimization are also proposed, see

Y. Fei, G. Rong, B. Wang, W. Wang, "Parallel L-BFGS-B algorithm on GPU", Computers & Graphics , vol. 40, 2014, pp. 1-9.

At this github page , a full CUDA implementation is available, generalizing the Numerical Recipes approach employing linmin , mkbrak and dbrent to the GPU parallel case. That approach implements Polak-Ribiére's scheme, but can be easily generalized to other quasi-Newton optimization problems.

还要看一下： libflame包含BLAS和LAPACK库提供的许多操作的实现。

Nvidia released a function that can do exactly this, called csrlsvqr , which performs well on small matrices. Unfortunately, for large sparse matrices, results (in my experience), have been poor. It is not able to converge on a solution.

To solve this I wrote my own tool that can accomplish this, LSQR-CUDA .

There are no procedures currently available in any library that implement solving a system of equations with a non-linear least squares solver using the CUDA platform. These algorithms must be written from scratch, with help from some other libraries that implement linear algebra with sparse matrices. Also, as mentioned in the comment above, the cuBLAS library will help with linear algebra.

https://developer.nvidia.com/cusparse

http://code.google.com/p/cusp-library/

For those who are still looking for an answer, this one is for sparse matrix: OpenOF, "Framework for Sparse Non-linear Least Squares Optimization on a GPU"

It's to GPU what g2o is to CPU.

non-linear optimization on the GPU (CUDA) without data transfer latency

Solving linear system using Python with numba and CUDA

QR decomposition to solve linear systems in CUDA

Why is solving system of linear equations using cula(dgesv) slower than mkl (dgesv) for small data sets

CUDA to solve many “small/moderate” linear systems

Cuda linear interpolation using textures

Using CUDA Thrust to evaluate recurrence relations for nonlinear partial differential equations

Implementing large linear regression models using CUDA

Solve the Poisson equation using FFT with CUDA

Cuda thread linear indexing

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question non-linear optimization on the GPU (CUDA) without data transfer latency Solving linear system using Python with numba and CUDA QR decomposition to solve linear systems in CUDA Why is solving system of linear equations using cula(dgesv) slower than mkl (dgesv) for small data sets CUDA to solve many “small/moderate” linear systems Cuda linear interpolation using textures Using CUDA Thrust to evaluate recurrence relations for nonlinear partial differential equations Implementing large linear regression models using CUDA Solve the Poisson equation using FFT with CUDA Cuda thread linear indexing

Related Tags

Using CUDA to solve a system of equations in non-linear least squares fashion

Question

5 answers

solution1
7 ACCPTED 2015-01-31 21:32:04

solution2
1 2012-11-08 00:36:50

solution3
1 2022-07-19 08:04:27

solution4
0 2012-11-07 23:36:33

solution5
0 2015-10-13 13:26:29

Using CUDA to solve a system of equations in non-linear least squares fashion

Question

5 answers

solution1 7 ACCPTED 2015-01-31 21:32:04

solution2 1 2012-11-08 00:36:50

solution3 1 2022-07-19 08:04:27

solution4 0 2012-11-07 23:36:33

solution5 0 2015-10-13 13:26:29

solution1
7 ACCPTED 2015-01-31 21:32:04

solution2
1 2012-11-08 00:36:50

solution3
1 2022-07-19 08:04:27

solution4
0 2012-11-07 23:36:33

solution5
0 2015-10-13 13:26:29