简体   繁体   English

计算非线性最小二乘拟合的 R^2

[英]Calculating R^2 for a nonlinear least squares fit

Suppose I have x values, y values, and expected y values f (from some nonlinear best fit curve).假设我有x值、 y值和预期 y 值f (来自一些非线性最佳拟合曲线)。

How can I compute R^2 in R?如何在 R 中计算 R^2? Note that this function is not a linear model, but a nonlinear least squares ( nls ) fit, so not an lm fit.请注意,此函数不是线性模型,而是非线性最小二乘 ( nls ) 拟合,因此不是lm拟合。

You just use the lm function to fit a linear model:您只需使用lm函数来拟合线性模型:

x = runif(100)
y = runif(100)
spam = summary(lm(x~y))
> spam$r.squared
[1] 0.0008532386

Note that the r squared is not defined for non-linear models, or at least very tricky, quote from R-help :请注意,r 平方不是为非线性模型定义的,或者至少非常棘手,引用自 R-help

There is a good reason that an nls model fit in R does not provide r-squared - r-squared doesn't make sense for a general nls model.在 R 中拟合的 nls 模型不提供 r 平方是有充分理由的 - r 平方对于一般的 nls 模型没有意义。

One way of thinking of r-squared is as a comparison of the residual sum of squares for the fitted model to the residual sum of squares for a trivial model that consists of a constant only. r 平方的一种思考方式是将拟合模型的残差平方和与仅由常数组成的平凡模型的残差平方和进行比较。 You cannot guarantee that this is a comparison of nested models when dealing with an nls model.在处理 nls 模型时,您不能保证这是嵌套模型的比较。 If the models aren't nested this comparison is not terribly meaningful.如果模型不是嵌套的,那么这种比较就没有太大意义。

So the answer is that you probably don't want to do this in the first place.所以答案是,您可能一开始就不想这样做。

If you want peer-reviewed evidence, see this article for example;如果您需要同行评审的证据,请参阅本文示例; it's not that you can't compute the R^2 value, it's just that it may not mean the same thing/have the same desirable properties as in the linear-model case.并不是说您无法计算 R^2 值,只是它可能与线性模型情况下的含义不同/具有相同的理想属性。

Sounds like f are your predicted values.听起来 f 是您的预测值。 So the distance from them to the actual values devided by n * variance of y所以从它们到实际值的距离除以 n * y 的方差

so something like所以像

1-sum((yf)^2)/(length(y)*var(y))

should give you a quasi rsquared value, so long as your model is reasonably close to a linear model and n is pretty big.应该给你一个准 rsquared 值,只要你的模型相当接近线性模型并且 n 非常大。

As a direct answer to the question asked (rather than argue that R2/pseudo R2 aren't useful) the nagelkerke function in the rcompanion package will report various pseudo R2 values for nonlinear least square (nls) models as proposed by McFadden, Cox and Snell, and Nagelkerke, eg作为对所问问题的直接回答(而不是争论 R2/伪 R2 没有用), rcompanion包中的nagelkerke函数将报告 McFadden、Cox 和Snell 和 Nagelkerke,例如

require(nls)
data(BrendonSmall)
quadplat = function(x, a, b, clx) {
          ifelse(x  < clx, a + b * x   + (-0.5*b/clx) * x   * x,
                           a + b * clx + (-0.5*b/clx) * clx * clx)}
model = nls(Sodium ~ quadplat(Calories, a, b, clx),
            data = BrendonSmall,
            start = list(a   = 519,
                         b   = 0.359,
                         clx = 2304))
nullfunct = function(x, m){m}
null.model = nls(Sodium ~ nullfunct(Calories, m),
             data = BrendonSmall,
             start = list(m   = 1346))
nagelkerke(model, null=null.model)

The soilphysics package also reports Efron's pseudo R2 and adjusted pseudo R2 value for nls models as 1 - RSS/TSS: soilphysics包还报告了 Efron 的伪 R2 和nls模型的调整伪 R2 值为 1 - RSS/TSS:

pred <- predict(model)
n <- length(pred)
res <- resid(model)
w <- weights(model)
if (is.null(w)) w <- rep(1, n)
rss <- sum(w * res ^ 2)
resp <- pred + res
center <- weighted.mean(resp, w)
r.df <- summary(model)$df[2]
int.df <- 1
tss <- sum(w * (resp - center)^2)
r.sq <- 1 - rss/tss
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
out <- list(pseudo.R.squared = r.sq,
            adj.R.squared = adj.r.sq)

which is also the pseudo R2 as calculated by the accuracy function in the rcompanion package.这也是由rcompanion包中的accuracy函数计算的pseudo R2 Basically, this R2 measures how much better your fit becomes compared to if you would just draw a flat horizontal line through them.基本上,这个 R2 衡量的是你的合身程度比你只画一条水平线穿过它们好多少。 This can make sense for nls models if your null model is one that allows for an intercept only model.如果您的空模型是允许仅截取模型的模型,则这对于nls模型是有意义的。 Also for particular other nonlinear models it can make sense.同样对于特定的其他非线性模型,它也有意义。 Eg for a scam model that uses stricly increasing splines (bs="mpi" in the spline term), the fitted model for the worst possible scenario (eg where your data was strictly decreasing) would be a flat line, and hence would result in an R2 of zero.例如,对于使用严格增加的样条(样条项中的 bs="mpi")的骗局模型,最坏可能情况(例如,您的数据严格减少)的拟合模型将是一条平坦线,因此会导致R2为零。 Adjusted R2 then also penalize models with higher nrs of fitted parameters.调整后的 R2 也会惩罚具有更高拟合参数 nrs 的模型。 Using the adjusted R2 value would already address a lot of the criticisms of the paper linked above, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/ (besides if one swears by using information criteria to do model selection the question becomes which one to use - AIC, BIC, EBIC, AICc, QIC, etc).使用调整后的 R2 值已经解决了上面链接的论文的许多批评, http ://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/(除了如果有人发誓使用信息标准来做模型选择问题变成了使用哪个 - AIC、BIC、EBIC、AICc、QIC 等)。

Just using只是使用

r.sq <- max(cor(y,yfitted),0)^2
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df

I think would also make sense if you have normal Gaussian errors - ie the correlation between the observed and fitted y (clipped at zero, so that a negative relationship would imply zero predictive power) squared, and then adjusted for the nr of fitted parameters in the adjusted version.我认为如果你有正常的高斯误差也有意义 - 即观察到的和拟合的 y 之间的相关性(裁剪为零,因此负关系意味着零预测能力)平方,然后调整拟合参数的 nr调整后的版本。 If y and yfitted go in the same direction this would be the R2 and adjusted R2 value as reported for a regular linear model.如果yyfitted走向相同的方向,这将是常规线性模型报告的R2adjusted R2值。 To me this would make perfect sense at least, so I don't agree with outright rejecting the usefulness of pseudo R2 values for nls models as the answer above seems to imply.对我来说,这至少是完全合理的,所以我不同意完全拒绝pseudo R2值对nls模型的有用性,因为上面的答案似乎暗示了这一点。

For non-normal error structures (eg if you were using a GAM with non-normal errors) the McFadden pseudo R2 is defined analogously as对于非正常错误结构(例如,如果您使用具有非正常错误的 GAM), McFadden pseudo R2的定义类似

1-residual deviance/null deviance

See here and here for some useful discussion.有关一些有用的讨论,请参阅此处此处

Another quasi-R-squared for non-linear models is to square the correlation between the actual y-values and the predicted y-values.非线性模型的另一个准 R 平方是对实际 y 值和预测 y 值之间的相关性进行平方。 For linear models this is the regular R-squared.对于线性模型,这是常规的 R 平方。

As an alternative to this problem I used at several times the following procedure:作为此问题的替代方案,我多次使用以下程序:

  1. compute a fit on data with the nls function使用 nls 函数计算数据拟合
  2. using the resulting model make predictions使用结果模型进行预测
  3. Trace (plot...) the data against the values predicted by the model (if the model is good, points should be near the bissectrix).根据模型预测的值跟踪(绘图...)数据(如果模型良好,点应靠近二等分线)。
  4. Compute the R2 of the linear régression.计算线性回归的 R2。

Best wishes to all.向所有人致以最良好的祝愿。 Patrick.帕特里克。

With the modelr package使用modelr

modelr::rsquare(nls_model, data)

nls_model <- nls(mpg ~ a /  wt + b, data = mtcars, start = list(a = 40, b = 4))

modelr::rsquare(nls_model, mtcars)
# 0.794

This gives essentially the same result as the longer way described by Tom from the rcompanion resource.这与 Tom 在rcompanion资源中描述的更长的方式基本相同。

Longer way with nagelkerke function使用nagelkerke函数的更长的路

nullfunct <- function(x, m){m}
null_model <- nls(mpg ~ nullfunct(wt, m),
                 data = mtcars,
                 start = list(m = mean(mtcars$mpg)))

nagelkerke(nls_model, null_model)[2]
# 0.794 or 0.796

Lastly, using predicted values最后,使用预测值

lm(mpg ~ predict(nls_model), data = mtcars) %>% broom::glance()
# 0.795

Like they say, it's only an approximation.正如他们所说,这只是一个近似值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM