[英]How to predict and extract R Squared with .lm.fit?
As the title suggest, I have seen some user mentioned that .lm.fit()
functions has an advantage of more speed than a regular lm()
, but when i look deeper at .lm.fit()
in help, it is supposed to be a fitter functions, it returns a set of list instead of a model , which makes me to think is it still possible to extract components like R squared, Adj R Squared, and lastly do a predict()
out of it?正如标题所暗示的那样,我看到一些用户提到.lm.fit()
函数具有比常规lm()
更快的速度优势,但是当我深入了解.lm.fit()
的帮助时,它应该是作为一个更合适的函数,它返回一组列表而不是model ,这让我认为是否仍然可以提取 R 平方、Adj R 平方等组件,最后从中进行predict()
?
Below is sample data and executions:以下是示例数据和执行:
test_dat <- data.frame(y = rnorm(780, 20, 10))
for(b in 1:300){
name_var <- paste0("x",b)
test_dat[[name_var]] <- rnorm(780, 0.01 * b, 5)
}
tic()
obj_lm <- lm(y ~ ., data = test_dat)
print(class(obj_lm))
print(summary(obj_lm)$r.squared)
print(summary(obj_lm)$adj.r.squared)
predict(obj_lm)
toc() #approximately 0.4 seconds
tic()
datm <- as.matrix(test_dat)
obj_lm_fit <- .lm.fit(cbind(1,datm[,-1]), datm[,1])
print(class(obj_lm_fit))
toc() #approximately 0.2 seconds
Functions predict
and resid
are generic and since .lm.fit
returns an object of class "list"
, all you have to do is to write methods implementing the definitions of what you want.函数predict
和resid
是通用的,因为.lm.fit
返回 object of class "list"
,你所要做的就是编写实现你想要的定义的方法。 Below are methods to compute fitted values, residuals and R^2.以下是计算拟合值、残差和 R^2 的方法。
set.seed(2023) # make the results reproducible
test_dat <- data.frame(y = rnorm(780, 20, 10))
for(b in 1:300){
name_var <- paste0("x",b)
test_dat[[name_var]] <- rnorm(780, 0.01 * b, 5)
}
obj_lm <- lm(y ~ ., data = test_dat)
datm <- as.matrix(test_dat)
obj_lm_fit <- .lm.fit(cbind(1,datm[,-1]), datm[,1])
#------------------------------------------------------------------------
# the methods for objects of class "list"
#
fitted.list <- function(object, X) {
X %*% coef(object)
}
resid.list <- residuals.list <- function(object, X, y) {
y_fitted <- fitted(object, X)
y - y_fitted
}
rsquared <- function(x, ...) UseMethod("rsquared")
rsquared.default <- function(x, ...) {
summary(x)$r.squared
}
rsquared.list <- function(object, X, y) {
e <- resid.list(object, X, y)
1 - sum(e^2)/sum( (y - mean(y))^2 )
}
rsquared(obj_lm_fit, cbind(1,datm[,-1]), datm[,1])
#> [1] 0.3948863
rsquared(obj_lm)
#> [1] 0.3948863
Created on 2023-01-03 with reprex v2.0.2创建于 2023-01-03,使用reprex v2.0.2
Added method to also calculate adj.R2添加了计算 adj.R2 的方法
adj_rsquared_list <- function(object, X, y){
r2 <- rsquared.list(object, X, y)
k <- ncol(X) - 1
n <- nrow(X)
rate_of_error <- (1 - r2) * (n - 1) / (n - k - 1)
adj_r2 <- 1 - rate_of_error
return(adj_r2)
}
adj_rsquared_list(obj_lm_fit, cbind(1,datm[,-1]), datm[,1])
#> [1] 0.01590061
After the edit by Jovan , I have changed fitted.list
above to use coef()
, a function that extracts the first arguments list member "coefficients"
, if it exists, and rewrote the default and list methods of rsquared
to accept a adj
argument.在Jovan编辑之后,我将上面的fitted.list
更改为使用coef()
,一个 function 提取第一个 arguments 列表成员"coefficients"
(如果存在),并重写rsquared
的默认和列表方法以接受adj
参数. The code to compute the adjusted R^2 is a copy&paste of Jovan's code.计算调整后的 R^2 的代码是 Jovan 代码的复制和粘贴。
rsquared <- function(x, ...) UseMethod("rsquared")
rsquared.default <- function(x, adj = FALSE, ...) {
if(adj) {
summary(x)$adj.r.squared
} else summary(x)$r.squared
}
rsquared.list <- function(object, X, y, adj = FALSE) {
e <- resid.list(object, X, y)
r2 <- 1 - sum(e^2)/sum( (y - mean(y))^2 )
if(adj) {
k <- ncol(X) - 1
n <- nrow(X)
rate_of_error <- (1 - r2) * (n - 1) / (n - k - 1)
adj_r2 <- 1 - rate_of_error
adj_r2
} else r2
}
# same as above
rsquared(obj_lm_fit, cbind(1,datm[,-1]), datm[,1])
#> [1] 0.3948863
rsquared(obj_lm)
#> [1] 0.3948863
# new, `adj = TRUE`
rsquared(obj_lm_fit, cbind(1,datm[,-1]), datm[,1], adj = TRUE)
#> [1] 0.01590061
rsquared(obj_lm, adj = TRUE)
#> [1] 0.01590061
Created on 2023-01-03 with reprex v2.0.2创建于 2023-01-03,使用reprex v2.0.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.