每個變量類別的線性回歸

Question

假設我正在使用 R 中的iris數據集：

data(iris)

summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   : 4.300   Min.   : 2.000   Min.   : 1.000   Min.   : 0.100  
 1st Qu.: 5.100   1st Qu.: 2.800   1st Qu.: 1.600   1st Qu.: 0.300  
 Median : 5.800   Median : 3.000   Median : 4.350   Median : 1.300  
 Mean   : 5.843   Mean   : 3.057   Mean   : 3.758   Mean   : 1.199  
 3rd Qu.: 6.400   3rd Qu.: 3.300   3rd Qu.: 5.100   3rd Qu.: 1.800  
 Max.   : 7.900   Max.   : 4.400   Max.   : 6.900   Max.   : 2.500  
       Species  
 setosa    : 50  
 versicolor: 50  
 virginica : 50

我想執行線性回歸，其中Petal.Length是因變量， Sepal.Length是自變量。 在 R 中，我如何一次對每個Species類別執行此回歸，為每個測試獲取 P、R² 和 F 的值？

Answer 1

by使用。

by(iris, iris$Species, \(x) summary(lm(Petal.Length ~ Sepal.Length, x)))
# iris$Species: setosa
# 
# Call:
#   lm(formula = Petal.Length ~ Sepal.Length, data = x)
# 
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -0.40856 -0.08027 -0.00856  0.11708  0.46512 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)  
# (Intercept)   0.80305    0.34388   2.335   0.0238 *
#   Sepal.Length  0.13163    0.06853   1.921   0.0607 .
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.1691 on 48 degrees of freedom
# Multiple R-squared:  0.07138, Adjusted R-squared:  0.05204 
# F-statistic:  3.69 on 1 and 48 DF,  p-value: 0.0607
# 
# --------------------------------------------------------- 
#   iris$Species: versicolor
# 
# Call:
#   lm(formula = Petal.Length ~ Sepal.Length, data = x)
# 
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -0.68611 -0.22827 -0.04123  0.19458  0.79607 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   0.18512    0.51421   0.360     0.72    
# Sepal.Length  0.68647    0.08631   7.954 2.59e-10 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.3118 on 48 degrees of freedom
# Multiple R-squared:  0.5686,  Adjusted R-squared:  0.5596 
# F-statistic: 63.26 on 1 and 48 DF,  p-value: 2.586e-10
# 
# --------------------------------------------------------- 
#   iris$Species: virginica
# 
# Call:
#   lm(formula = Petal.Length ~ Sepal.Length, data = x)
# 
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -0.68603 -0.21104  0.06399  0.18901  0.66402 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   0.61047    0.41711   1.464     0.15    
# Sepal.Length  0.75008    0.06303  11.901  6.3e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.2805 on 48 degrees of freedom
# Multiple R-squared:  0.7469,  Adjusted R-squared:  0.7416 
# F-statistic: 141.6 on 1 and 48 DF,  p-value: 6.298e-16

編輯

為了詳細說明我的評論，我們可以很容易地提取所需的值

by(iris, iris$Species, \(x) lm(Petal.Length ~ Sepal.Length, x)) |>
  lapply(\(x) {
    with(summary(x), c(r2=r.squared, f=fstatistic, 
                       p=do.call(pf, c(as.list(unname(fstatistic)), lower.tail=FALSE))))
  }) |> do.call(what=rbind)
#                    r2    f.value f.numdf f.dendf            p
# setosa     0.07138289   3.689765       1      48 6.069778e-02
# versicolor 0.56858983  63.263024       1      48 2.586190e-10
# virginica  0.74688439 141.636664       1      48 6.297786e-16

Answer 2

如果您想提取這些值，我們可以使用

library (dplyr) 

df <- iris

list_res <- df %>%
  base::split (., df$Species, drop = FALSE) %>%
  lapply (., function (x) {
   fit <- lm(Petal.Length ~ Sepal.Length, data = x) %>%
      summary ()
   r <- fit$r.squared
   coeffs <- fit$coefficients %>% 
     as_tibble ()
   f <- fit$fstatistic[[1]] 
   list_res <- list (r, coeffs, f)
   names (list_res) <- c("R-Squared", "Coefficients", "F-Value")
   return (list_res)
  })

這會為每個回歸 model 返回包含所需值的三個對象的列表。 我將系數表保留在這里，因為知道您的 p 值屬於哪個自變量總是好的。 例如，如果您希望單獨提取這些 p 值，我們可以使用coeffs <- fit$coefficients [,4] %>% as.list () 。

每個變量類別的線性回歸

問題描述

2 個解決方案

解決方案1
2 已采納 2022-10-03 08:32:40

編輯

解決方案2
1 2022-10-03 09:48:04

每個變量類別的線性回歸

問題描述

2 個解決方案

解決方案1 2 已采納 2022-10-03 08:32:40

編輯

解決方案2 1 2022-10-03 09:48:04

解決方案1
2 已采納 2022-10-03 08:32:40

解決方案2
1 2022-10-03 09:48:04