简体   繁体   中英

Least Squares Regression in C/C++

如何在C / C ++中实现因子分析的最小二乘回归?

the gold standard for this is LAPACK . you want, in particular, xGELS .

When I've had to deal with large datasets and large parameter sets for non-linear parameter fitting I used a combination of RANSAC and Levenberg-Marquardt. I'm talking thousands of parameters with tens of thousands of data-points.

RANSAC is a robust algorithm for minimizing noise due to outliers by using a reduced data set. Its not strictly Least Squares, but can be applied to many fitting methods.

Levenberg-Marquardt is an efficient way to solve non-linear least-squares numerically. The convergence rate in most cases is between that of steepest-descent and Newton's method, without requiring the calculation of second derivatives. I've found it to be faster than Conjugate gradient in the cases I've examined.

The way I did this was to set up the RANSAC an outer loop around the LM method. This is very robust but slow. If you don't need the additional robustness you can just use LM.

Get ROOT and use TGraph::Fit() (or TGraphErrors::Fit() )?

Big, heavy piece of software to install just of for the fitter, though. Works for me because I already have it installed.

Or use GSL .

If you want to implement an optimization algorithm by yourself Levenberg-Marquard seems to be quite difficult to implement. If really fast convergence is not needed, take a look at the Nelder-Mead simplex optimization algorithm. It can be implemented from scratch in at few hours.

http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method

Have a look at http://www.alglib.net/optimization/

They have C++ implementations for L-BFGS and Levenberg-Marquardt.

You only need to work out the first derivative of your objective function to use these two algorithms.

I've used TNT/JAMA for linear least-squares estimation. It's not very sophisticated but is fairly quick + easy.

Lets talk first about factor analysis since most of the discussion above is about regression. Most of my experience is with software like SAS, Minitab, or SPSS, that solves the factor analysis equations, so I have limited experience in solving these directly. That said, that the most common implementations do not use linear regression to solve the equations. According to this , the most common methods used are principal component analysis and principal factor analysis. In a text on Applied Multivariate Analysis (Dallas Johnson), no less that seven methods are documented each with their own pros and cons. I would strongly recommend finding an implementation that gives you factor scores rather than programming a solution from scratch.

The reason why there's different methods is that you can choose exactly what you're trying to minimize. There a pretty comprehensive discussion of the breadth of methods here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM