简体   繁体   中英

Weighted linear least square for 2D data point sets

My question is an extension of the discussion How to fit the 2D scatter data with a line with C++ . Now I want to extend my question further: when estimating the line that fits 2D scatter data, it would be better if we can treat each 2D scatter data differently. That is to say, if the scatter point is far away from the line, we can give the point a low weighting, and vice versa. Therefore, the question then becomes: given an array of 2D scatter points as well as their weighting factors, how can we estimate the linear line that passes them? A good implementation of this method can be found in this article ( weighted least regression ). However, the implementation of the algorithm in that article is too complicated as it involves matrix calculation. I am therefore trying to find a method without matrix calculation. The algorithm is an extension of simple linear regression , and in order to illustrate the algorithm, I wrote the following MATLAB codes:

function line = weighted_least_squre_for_line(x,y,weighting);


part1 = sum(weighting.*x.*y)*sum(weighting(:));

part2 = sum((weighting.*x))*sum((weighting.*y));

part3 = sum( x.^2.*weighting)*sum(weighting(:));

part4 = sum(weighting.*x).^2; 

beta = (part1-part2)/(part3-part4);

alpha = (sum(weighting.*y)-beta*sum(weighting.*x))/sum(weighting);

a = beta;
c = alpha;
b = -1;
line = [a b c];

In the above codes, x,y,weighting represent the x-coordinate, y-coordinate and the weighting factor respectively. I test the algorithm with several examples but still not sure whether it is right or not as this method gets a different result with Polyfit , which relies on matrix calculation. I am now posting the implementation here and for your advice. Do you think it is a right implementation? Thanks!

如果你认为减少远离线路的点是个好主意,你可能会被http://en.wikipedia.org/wiki/Least_absolute_deviations所吸引,因为计算这个的一种方法是通过http:// en .wikipedia.org / wiki / Iteratively_re-weighted_least_squares ,这将减少远离线的点的权重。

If you think all your points are "good data", then it would be a mistake to weight them naively according to their distance from your initial fit. However, it's a fairly common practice to discard "outliers": if a few data points are implausibly far from the fit, and you have reason to believe that there's an error mechanism that could generate a small subset of "bad" datapoints, you could simply remove the implausible points from the dataset to get a better fit.

As far as the math is concerned, I would recommend biting the bullet and trying to figure out the matrix math. Perhaps you could find a different article, or a book which has a better presentation. I will not comment on your Matlab code, except to say that it looks like you will have some precision problems when subtracting part4 from part3 , and probably part2 from part1 as well.

Not exactly what you are asking for, but you should look into robust regression . MATLAB has the function robustfit (requires Statistics Toolbox).

There is even an interactive demo you can play with to compare regular linear regression vs. robust regression:

>> robustdemo

This shows that in the presence of outliers, robust regression tends to give better results.

截图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM