简体   繁体   English

测量R中点之间对齐方式的最可靠方法

[英]Most robust way to measure the alignment between points in R

Until now I used to use cor to measure the alignment between points. 到目前为止,我以前一直使用cor来测量点之间的对齐方式。 I am quite happy with the results, values between -1 and 0 always spot the lines I want. 我对结果感到非常满意,介于-10之间的值总是会发现我想要的行。 However, thanks to the answers and comments in here , I realised it's not the most robust approach because of the standard deviation being zero for flat lines, such as: 但是,由于这里的回答和评论,我意识到这不是最可靠的方法,因为对于扁平线,标准偏差为零,例如:

> cor(1:10, rep(10,10))
[1] NA
Warning message:
In cor(1:10, rep(10, 10)) : the standard deviation is zero

My aim is to define a function which gives 1 for points perfectly aligned (regardless the slope) and values closer to 0 for points not in line. 我的目标是定义一个函数,该函数为完全对齐的点(不考虑坡度)提供1 ,为不在直线上的点提供接近0值。 Would you suggest a more robust approach than mine? 您会提出比我更强大的方法吗?

EDIT: 编辑:

following the suggestion of @Hong Ooi I got 按照@Hong Ooi的建议,我得到了

data1 <- data.frame(date = c(13636, 13636, 14403, 14761, 15201, 15741),
                    value = c(865310, 999989, 999989, 2, 999989, 26))

data2 <- data.frame(date = c(12667, 12745, 13106, 13276, 13461, 13626),
                    value = c(1904, 2055, 2740, 3376, 3567, 4099))

m <- cbind(data1$date, data1$value)
sdev <- prcomp(m)$sdev
sdev[1]/sum(sdev)
# 0.9986399

m <- cbind(data2$date, data2$value)
sdev <- prcomp(m)$sdev
sdev[1]/sum(sdev)
# 0.961

However, I was expecting the a very low value for data1 但是,我期望data1值非常低

在此处输入图片说明

You could use principal components , or more specifically, the proportion of total variance explained by the first principal component. 您可以使用主成分 ,或更具体地说,可以使用第一个主成分解释的总方差的比例。 This is equivalent to fitting the line that minimises the sum of squares of the orthogonal distances of points to the line, as opposed to vertical distances (which is what correlations do). 这等效于拟合直线,以使点到直线的正交距离的平方和最小化,这与垂直距离相反(这是相关的功能)。

This can be done in R with either the prcomp or princomp function. 可以使用prcompprincomp函数在R中完成此操作。

m <- cbind(1:10, rep(10, 10))
sdev <- prcomp(m)$sdev
sdev[1]/sum(sdev)

How about using the R-squared (or adjusted R-Squared) of a regression? 如何使用回归的R平方(或调整后的R平方)? After all, an R-squared is simply the square of the sample correlation coefficient. 毕竟,R平方只是样本相关系数的平方。

reg.data1 <-lm(data1$value~data1$date)
summary(reg.data1)$adj.r.squared
#[1] 0.1844582

reg.data2 <-lm(data2$value~data2$date)
summary(reg.data2)$adj.r.squared
#[1] 0.9848801

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM