[英]Linear regression for each category of a variable
假設我正在使用 R 中的iris
數據集:
data(iris)
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. : 4.300 Min. : 2.000 Min. : 1.000 Min. : 0.100
1st Qu.: 5.100 1st Qu.: 2.800 1st Qu.: 1.600 1st Qu.: 0.300
Median : 5.800 Median : 3.000 Median : 4.350 Median : 1.300
Mean : 5.843 Mean : 3.057 Mean : 3.758 Mean : 1.199
3rd Qu.: 6.400 3rd Qu.: 3.300 3rd Qu.: 5.100 3rd Qu.: 1.800
Max. : 7.900 Max. : 4.400 Max. : 6.900 Max. : 2.500
Species
setosa : 50
versicolor: 50
virginica : 50
我想執行線性回歸,其中Petal.Length
是因變量, Sepal.Length
是自變量。 在 R 中,我如何一次對每個Species
類別執行此回歸,為每個測試獲取 P、R² 和 F 的值?
by
使用。
by(iris, iris$Species, \(x) summary(lm(Petal.Length ~ Sepal.Length, x)))
# iris$Species: setosa
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.40856 -0.08027 -0.00856 0.11708 0.46512
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.80305 0.34388 2.335 0.0238 *
# Sepal.Length 0.13163 0.06853 1.921 0.0607 .
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1691 on 48 degrees of freedom
# Multiple R-squared: 0.07138, Adjusted R-squared: 0.05204
# F-statistic: 3.69 on 1 and 48 DF, p-value: 0.0607
#
# ---------------------------------------------------------
# iris$Species: versicolor
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.68611 -0.22827 -0.04123 0.19458 0.79607
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.18512 0.51421 0.360 0.72
# Sepal.Length 0.68647 0.08631 7.954 2.59e-10 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.3118 on 48 degrees of freedom
# Multiple R-squared: 0.5686, Adjusted R-squared: 0.5596
# F-statistic: 63.26 on 1 and 48 DF, p-value: 2.586e-10
#
# ---------------------------------------------------------
# iris$Species: virginica
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.68603 -0.21104 0.06399 0.18901 0.66402
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.61047 0.41711 1.464 0.15
# Sepal.Length 0.75008 0.06303 11.901 6.3e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2805 on 48 degrees of freedom
# Multiple R-squared: 0.7469, Adjusted R-squared: 0.7416
# F-statistic: 141.6 on 1 and 48 DF, p-value: 6.298e-16
為了詳細說明我的評論,我們可以很容易地提取所需的值
by(iris, iris$Species, \(x) lm(Petal.Length ~ Sepal.Length, x)) |>
lapply(\(x) {
with(summary(x), c(r2=r.squared, f=fstatistic,
p=do.call(pf, c(as.list(unname(fstatistic)), lower.tail=FALSE))))
}) |> do.call(what=rbind)
# r2 f.value f.numdf f.dendf p
# setosa 0.07138289 3.689765 1 48 6.069778e-02
# versicolor 0.56858983 63.263024 1 48 2.586190e-10
# virginica 0.74688439 141.636664 1 48 6.297786e-16
如果您想提取這些值,我們可以使用
library (dplyr)
df <- iris
list_res <- df %>%
base::split (., df$Species, drop = FALSE) %>%
lapply (., function (x) {
fit <- lm(Petal.Length ~ Sepal.Length, data = x) %>%
summary ()
r <- fit$r.squared
coeffs <- fit$coefficients %>%
as_tibble ()
f <- fit$fstatistic[[1]]
list_res <- list (r, coeffs, f)
names (list_res) <- c("R-Squared", "Coefficients", "F-Value")
return (list_res)
})
這會為每個回歸 model 返回包含所需值的三個對象的列表。 我將系數表保留在這里,因為知道您的 p 值屬於哪個自變量總是好的。 例如,如果您希望單獨提取這些 p 值,我們可以使用coeffs <- fit$coefficients [,4] %>% as.list ()
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.