简体   繁体   中英

Fast adjusted r-squared extraction

.lm.fit is considerably faster than lm for reasons documented in several places, but it is not as straight forward to get an adjusted r-squared value so I'm hoping for some help.

Using lm() and then summary() to get the adjusted r-squared.

tstlm <- lm(cyl ~ hp + wt, data = mtcars)

summary(tstlm)$adj.r.squared

Using.lm.fit

mtmatrix <- as.matrix(mtcars)

tstlmf <- .lm.fit(cbind(1,mtmatrix [,c("hp","wt")]), mtmatrix [,"cyl"])

And here I'm stuck. I suspect the information I need to calculate adjusted r-squared is found in the.lm.fit model somewhere but I can't quite figure out how to proceed.

Thanks in advance for any suggestions.

1) R squared equals the squared correlation between the dependent variable and the fitted values. We can get the residuals from tstlmf using resid(tstslmf) and the fitted values equal y minus those residuals.

Adjusted R squared is formed by multiplying R squared by an expression using only the number of rows and columns of X.

Note that the formulas would change if there is no intercept.

X <- with(mtcars, cbind(1, hp, wt))
y <- mtcars$cyl

testlmf <- .lm.fit(X, y)

rsq <- cor(y, y - resid(tstlmf))^2; rsq
## [1] 0.7898

adj <- 1 - (1-rsq) * (nrow(X) - 1) / -diff(dim(X)); adj
## [1] 0.7753


# check
tstlm <- lm(cyl ~ hp + wt, mtcars)
s <- summary(tstlm)
s$r.squared
## [1] 0.7898
s$adj.r.squared
## [1] 0.7753

2) R squared can alternately be calculated as the ratio var(fitted) / var(y) as in the link above and in that case we write:

testlmf <- .lm.fit(X, y)

rsq <- var(y - resid(tstlmf)) / var(y); rsq
## [1] 0.7898

adj <- 1 - (1-rsq) * (nrow(X) - 1) / -diff(dim(X)); adj
## [1] 0.7753

collapse

flm in the collapse package may be slightly faster than.lm.fit. It returns the coefficients only.

library(collapse)

tstflm <- flm(y, X)
rsq <- c(cor(y, X %*% tstflm)^2); rsq
## [1] 0.7898
adj <- 1 - (1-rsq) * (nrow(X) - 1) / -diff(dim(X)); adj
## [1] 0.7753

or

tstflm <- flm(y, X)

rsq <- var(X %*% tstflm) / var(y); rsq
## [1] 0.7898
adj <- 1 - (1-rsq) * (nrow(X) - 1) / -diff(dim(X)); adj
## [1] 0.7753

The following function computes the adjusted R2 from an object returned by .lm.fit and the response vector y .

adj_r2_lmfit <- function(object, y){
  ypred <- y - resid(object)
  mss <- sum((ypred - mean(ypred))^2)
  rss <- sum(resid(object)^2)
  rdf <- length(resid(object)) - object$rank
  r.squared <- mss/(mss + rss)
  adj.r.squared <- 1 - (1 - r.squared)*(NROW(y) - 1)/rdf
  adj.r.squared
}

tstlm <- lm(cyl ~ hp + wt, data = mtcars)
tstlmf <- .lm.fit(cbind(1,mtmatrix [,c("hp","wt")]), mtmatrix [,"cyl"])

summary(tstlm)$adj.r.squared
#[1] 0.7753073
adj_r2_lmfit(tstlmf, mtmatrix [,"cyl"])
#[1] 0.7753073

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM