简体   繁体   中英

Total Least Squares algorithm in C/C++

Given a set of points PI need to find a line L that best approximates these points. I have tried to use the function gsl_fit_linear from the GNU scientific library. However my data set often contains points that have a line of best fit with undefined slope (x=c), thus gsl_fit_linear returns NaN . It is my understanding that it is best to use total least squares for this sort of thing because it is fast, robust and it gives the equation in terms of r and theta (so x=c can still be represented). I can't seem to find any C/C++ code out there currently for this problem. Does anyone know of a library or something that I can use? I've read a few research papers on this but the topic is still a little fizzy so I don't feel confident implementing my own.

Update:

I made a first attempt at programming my own with armadillo using the given code on this wikipedia page. Alas I have so far been unsuccessful.

This is what I have so far:

void pointsToLine(vector<Point> P)
{
    Row<double> x(P.size());
    Row<double> y(P.size());

    for (int i = 0; i < P.size(); i++)
    {
         x << P[i].x;
         y << P[i].y;
    }

    int m = P.size();
    int n = x.n_cols;

    mat Z = join_rows(x, y);

    mat U;
    vec s;
    mat V;
    svd(U, s, V, Z);

    mat VXY = V(span(0, (n-1)), span(n, (V.n_cols-1)));
    mat VYY = V(span(n, (V.n_rows-1)) , span(n, (V.n_cols-1)));

    mat B = (-1*VXY) / VYY;
    cout << B << endl;
}

the output from B is always 0.5504, Even when my data set changes. As well I thought that the output should be two values, so I'm definitely doing something very wrong.

Thanks!

To find the line that minimises the sum of the squares of the (orthogonal) distances from the line, you can proceed as follows:

The line is the set of points p+r*t where p and t are vectors to be found, and r varies along the line. We restrict t to be unit length. While there is another, simpler, description in two dimensions, this one works with any dimension.

The steps are

1/ compute the mean p of the points

2/ accumulate the covariance matrix C

    C = Sum{ i | (q[i]-p)*(q[i]-p)' } / N

(where you have N points and ' denotes transpose)

3/ diagonalise C and take as t the eigenvector corresponding to the largest eigenvalue.

All this can be justified, starting from the (orthogonal) distance squared of a point q from a line represented as above, which is

d2(q) = q'*q - ((q-p)'*t)^2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM