使用 purrr 運行具有不斷變化的結果的多個回歸模型，然后提取殘差

Question

我想運行一些具有不同y的回歸模型（因此所有模型的自變量保持相同），然后從每個模型中提取殘差並將它們添加到原始數據集中。

我將使用diamonds來展示我的想法：

# In my example, the models are: x or y or z = carat + cut + color + clarity + price

dependent = c("x", "y", "z")

model = function(y, dataset) {
 a = map(
   setNames(y, y), ~ glm(reformulate(termlabels = c("carat", "cut", "color", "clarity", "price"),
                                     response = y),
                         family = gaussian(link = "identity"),
                         data = dataset
   )) 
 
 resids = map_dfr(a, residuals)
 
 df = bind_cols(dataset, resids)

 print(df)

}

model(y = dependent, dataset = diamonds)

但是這段代碼不起作用。 我還想為作為新列添加的殘差起一個合理的名稱，否則當模型數量很大時很難區分殘差。

Answer 1

生成示例數據

library(tidyverse)

set.seed(101)
dd <- diamonds
dependent <- c("x", "y", "z")
for (d in dependent) {
  dd[[d]] <- rnorm(nrow(diamonds))
}

過程

library(broom)
res <- (dependent
  ## set names so .id = argument works downstream
  %>% setNames(dependent)
  ## construct list of formulas
  %>% map(reformulate, termlabels = c("carat", "cut", "color", "clarity", "price"))
  ## fit glmes
  %>% map(glm, family = gaussian(link = "identity"), dd,
          na.action = na.exclude)
  ## compute resids (add observation number) and collapse to tibble
  %>% map_dfr(~tibble(.obs=seq(nrow(dd)), .resid = residuals(.)), .id = "response")
  ## widen data → residuals from each response variable as a column
  %>% pivot_wider(names_from = "response", values_from = ".resid", 
                  names_prefix ="res_")
  %>% select(-.obs)
)
## combine with original data
res2 <- bind_cols(dd, res)

筆記：

我不明白為什么你在這里使用glm(., family = gaussian(link = "identity)) ，除非它是你用真實數據做的更復雜的事情的占位符。（如果這是你的實際model 然后使用lm()將更簡單和更快。）
添加na.action = na.exclude到[g]lm()調用將在預測、殘差等中包含NA值，這將幫助您的殘差與原始數據更好地對齊。

使用 purrr 運行具有不斷變化的結果的多個回歸模型，然后提取殘差

問題描述

1 個解決方案

解決方案1
2 已采納 2022-03-26 22:42:58

生成示例數據

過程

使用 purrr 運行具有不斷變化的結果的多個回歸模型，然后提取殘差

問題描述

1 個解決方案

解決方案1 2 已采納 2022-03-26 22:42:58

生成示例數據

過程

解決方案1
2 已采納 2022-03-26 22:42:58