[英]Calculating R^2 for a nonlinear least squares fit
Suppose I have x
values, y
values, and expected y values f
(from some nonlinear best fit curve).假设我有
x
值、 y
值和预期 y 值f
(来自一些非线性最佳拟合曲线)。
How can I compute R^2 in R?如何在 R 中计算 R^2? Note that this function is not a linear model, but a nonlinear least squares (
nls
) fit, so not an lm
fit.请注意,此函数不是线性模型,而是非线性最小二乘 (
nls
) 拟合,因此不是lm
拟合。
You just use the lm
function to fit a linear model:您只需使用
lm
函数来拟合线性模型:
x = runif(100)
y = runif(100)
spam = summary(lm(x~y))
> spam$r.squared
[1] 0.0008532386
Note that the r squared is not defined for non-linear models, or at least very tricky, quote from R-help :请注意,r 平方不是为非线性模型定义的,或者至少非常棘手,引用自 R-help :
There is a good reason that an nls model fit in R does not provide r-squared - r-squared doesn't make sense for a general nls model.
在 R 中拟合的 nls 模型不提供 r 平方是有充分理由的 - r 平方对于一般的 nls 模型没有意义。
One way of thinking of r-squared is as a comparison of the residual sum of squares for the fitted model to the residual sum of squares for a trivial model that consists of a constant only.
r 平方的一种思考方式是将拟合模型的残差平方和与仅由常数组成的平凡模型的残差平方和进行比较。 You cannot guarantee that this is a comparison of nested models when dealing with an nls model.
在处理 nls 模型时,您不能保证这是嵌套模型的比较。 If the models aren't nested this comparison is not terribly meaningful.
如果模型不是嵌套的,那么这种比较就没有太大意义。
So the answer is that you probably don't want to do this in the first place.
所以答案是,您可能一开始就不想这样做。
If you want peer-reviewed evidence, see this article for example;如果您需要同行评审的证据,请参阅本文示例; it's not that you can't compute the R^2 value, it's just that it may not mean the same thing/have the same desirable properties as in the linear-model case.
并不是说您无法计算 R^2 值,只是它可能与线性模型情况下的含义不同/具有相同的理想属性。
Sounds like f are your predicted values.听起来 f 是您的预测值。 So the distance from them to the actual values devided by n * variance of y
所以从它们到实际值的距离除以 n * y 的方差
so something like所以像
1-sum((yf)^2)/(length(y)*var(y))
should give you a quasi rsquared value, so long as your model is reasonably close to a linear model and n is pretty big.应该给你一个准 rsquared 值,只要你的模型相当接近线性模型并且 n 非常大。
As a direct answer to the question asked (rather than argue that R2/pseudo R2 aren't useful) the nagelkerke
function in the rcompanion
package will report various pseudo R2 values for nonlinear least square (nls) models as proposed by McFadden, Cox and Snell, and Nagelkerke, eg作为对所问问题的直接回答(而不是争论 R2/伪 R2 没有用),
rcompanion
包中的nagelkerke
函数将报告 McFadden、Cox 和Snell 和 Nagelkerke,例如
require(nls)
data(BrendonSmall)
quadplat = function(x, a, b, clx) {
ifelse(x < clx, a + b * x + (-0.5*b/clx) * x * x,
a + b * clx + (-0.5*b/clx) * clx * clx)}
model = nls(Sodium ~ quadplat(Calories, a, b, clx),
data = BrendonSmall,
start = list(a = 519,
b = 0.359,
clx = 2304))
nullfunct = function(x, m){m}
null.model = nls(Sodium ~ nullfunct(Calories, m),
data = BrendonSmall,
start = list(m = 1346))
nagelkerke(model, null=null.model)
The soilphysics
package also reports Efron's pseudo R2 and adjusted pseudo R2 value for nls
models as 1 - RSS/TSS: soilphysics
包还报告了 Efron 的伪 R2 和nls
模型的调整伪 R2 值为 1 - RSS/TSS:
pred <- predict(model)
n <- length(pred)
res <- resid(model)
w <- weights(model)
if (is.null(w)) w <- rep(1, n)
rss <- sum(w * res ^ 2)
resp <- pred + res
center <- weighted.mean(resp, w)
r.df <- summary(model)$df[2]
int.df <- 1
tss <- sum(w * (resp - center)^2)
r.sq <- 1 - rss/tss
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
out <- list(pseudo.R.squared = r.sq,
adj.R.squared = adj.r.sq)
which is also the pseudo R2
as calculated by the accuracy
function in the rcompanion
package.这也是由
rcompanion
包中的accuracy
函数计算的pseudo R2
。 Basically, this R2 measures how much better your fit becomes compared to if you would just draw a flat horizontal line through them.基本上,这个 R2 衡量的是你的合身程度比你只画一条水平线穿过它们好多少。 This can make sense for
nls
models if your null model is one that allows for an intercept only model.如果您的空模型是允许仅截取模型的模型,则这对于
nls
模型是有意义的。 Also for particular other nonlinear models it can make sense.同样对于特定的其他非线性模型,它也有意义。 Eg for a scam model that uses stricly increasing splines (bs="mpi" in the spline term), the fitted model for the worst possible scenario (eg where your data was strictly decreasing) would be a flat line, and hence would result in an
R2
of zero.例如,对于使用严格增加的样条(样条项中的 bs="mpi")的骗局模型,最坏可能情况(例如,您的数据严格减少)的拟合模型将是一条平坦线,因此会导致
R2
为零。 Adjusted R2 then also penalize models with higher nrs of fitted parameters.调整后的 R2 也会惩罚具有更高拟合参数 nrs 的模型。 Using the adjusted R2 value would already address a lot of the criticisms of the paper linked above, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/ (besides if one swears by using information criteria to do model selection the question becomes which one to use - AIC, BIC, EBIC, AICc, QIC, etc).
使用调整后的 R2 值已经解决了上面链接的论文的许多批评, http ://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/(除了如果有人发誓使用信息标准来做模型选择问题变成了使用哪个 - AIC、BIC、EBIC、AICc、QIC 等)。
Just using只是使用
r.sq <- max(cor(y,yfitted),0)^2
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
I think would also make sense if you have normal Gaussian errors - ie the correlation between the observed and fitted y (clipped at zero, so that a negative relationship would imply zero predictive power) squared, and then adjusted for the nr of fitted parameters in the adjusted version.我认为如果你有正常的高斯误差也有意义 - 即观察到的和拟合的 y 之间的相关性(裁剪为零,因此负关系意味着零预测能力)平方,然后调整拟合参数的 nr调整后的版本。 If
y
and yfitted
go in the same direction this would be the R2
and adjusted R2
value as reported for a regular linear model.如果
y
和yfitted
走向相同的方向,这将是常规线性模型报告的R2
和adjusted R2
值。 To me this would make perfect sense at least, so I don't agree with outright rejecting the usefulness of pseudo R2
values for nls
models as the answer above seems to imply.对我来说,这至少是完全合理的,所以我不同意完全拒绝
pseudo R2
值对nls
模型的有用性,因为上面的答案似乎暗示了这一点。
For non-normal error structures (eg if you were using a GAM with non-normal errors) the McFadden pseudo R2
is defined analogously as对于非正常错误结构(例如,如果您使用具有非正常错误的 GAM),
McFadden pseudo R2
的定义类似
1-residual deviance/null deviance
See here and here for some useful discussion.有关一些有用的讨论,请参阅此处和此处。
Another quasi-R-squared for non-linear models is to square the correlation between the actual y-values and the predicted y-values.非线性模型的另一个准 R 平方是对实际 y 值和预测 y 值之间的相关性进行平方。 For linear models this is the regular R-squared.
对于线性模型,这是常规的 R 平方。
As an alternative to this problem I used at several times the following procedure:作为此问题的替代方案,我多次使用以下程序:
Best wishes to all.向所有人致以最良好的祝愿。 Patrick.
帕特里克。
modelr
packagemodelr
包modelr::rsquare(nls_model, data)
nls_model <- nls(mpg ~ a / wt + b, data = mtcars, start = list(a = 40, b = 4))
modelr::rsquare(nls_model, mtcars)
# 0.794
This gives essentially the same result as the longer way described by Tom from the rcompanion
resource.这与 Tom 在
rcompanion
资源中描述的更长的方式基本相同。
nagelkerke
functionnagelkerke
函数的更长的路nullfunct <- function(x, m){m}
null_model <- nls(mpg ~ nullfunct(wt, m),
data = mtcars,
start = list(m = mean(mtcars$mpg)))
nagelkerke(nls_model, null_model)[2]
# 0.794 or 0.796
lm(mpg ~ predict(nls_model), data = mtcars) %>% broom::glance()
# 0.795
Like they say, it's only an approximation.正如他们所说,这只是一个近似值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.