Total Least Squares algorithm in C/C++

Question

Given a set of points PI need to find a line L that best approximates these points. I have tried to use the function gsl_fit_linear from the GNU scientific library. However my data set often contains points that have a line of best fit with undefined slope (x=c), thus gsl_fit_linear returns NaN . It is my understanding that it is best to use total least squares for this sort of thing because it is fast, robust and it gives the equation in terms of r and theta (so x=c can still be represented). I can't seem to find any C/C++ code out there currently for this problem. Does anyone know of a library or something that I can use? I've read a few research papers on this but the topic is still a little fizzy so I don't feel confident implementing my own.

Update:

I made a first attempt at programming my own with armadillo using the given code on this wikipedia page. Alas I have so far been unsuccessful.

This is what I have so far:

void pointsToLine(vector<Point> P)
{
    Row<double> x(P.size());
    Row<double> y(P.size());

    for (int i = 0; i < P.size(); i++)
    {
         x << P[i].x;
         y << P[i].y;
    }

    int m = P.size();
    int n = x.n_cols;

    mat Z = join_rows(x, y);

    mat U;
    vec s;
    mat V;
    svd(U, s, V, Z);

    mat VXY = V(span(0, (n-1)), span(n, (V.n_cols-1)));
    mat VYY = V(span(n, (V.n_rows-1)) , span(n, (V.n_cols-1)));

    mat B = (-1*VXY) / VYY;
    cout << B << endl;
}

the output from B is always 0.5504, Even when my data set changes. As well I thought that the output should be two values, so I'm definitely doing something very wrong.

Thanks!

Answer 1

To find the line that minimises the sum of the squares of the (orthogonal) distances from the line, you can proceed as follows:

The line is the set of points p+r*t where p and t are vectors to be found, and r varies along the line. We restrict t to be unit length. While there is another, simpler, description in two dimensions, this one works with any dimension.

The steps are

1/ compute the mean p of the points

2/ accumulate the covariance matrix C

    C = Sum{ i | (q[i]-p)*(q[i]-p)' } / N

(where you have N points and ' denotes transpose)

3/ diagonalise C and take as t the eigenvector corresponding to the largest eigenvalue.

All this can be justified, starting from the (orthogonal) distance squared of a point q from a line represented as above, which is

d2(q) = q'*q - ((q-p)'*t)^2

Total Least Squares algorithm in C/C++

Question

1 answers

solution1
0 2015-03-23 18:00:44

Total Least Squares algorithm in C/C++

Question

1 answers

solution1 0 2015-03-23 18:00:44

solution1
0 2015-03-23 18:00:44