[英]Use purrr to run multiple regression models with changing outcomes and then extract residuals
我想運行一些具有不同y
的回歸模型(因此所有模型的自變量保持相同),然后從每個模型中提取殘差並將它們添加到原始數據集中。
我將使用diamonds
來展示我的想法:
# In my example, the models are: x or y or z = carat + cut + color + clarity + price
dependent = c("x", "y", "z")
model = function(y, dataset) {
a = map(
setNames(y, y), ~ glm(reformulate(termlabels = c("carat", "cut", "color", "clarity", "price"),
response = y),
family = gaussian(link = "identity"),
data = dataset
))
resids = map_dfr(a, residuals)
df = bind_cols(dataset, resids)
print(df)
}
model(y = dependent, dataset = diamonds)
但是這段代碼不起作用。 我還想為作為新列添加的殘差起一個合理的名稱,否則當模型數量很大時很難區分殘差。
library(tidyverse)
set.seed(101)
dd <- diamonds
dependent <- c("x", "y", "z")
for (d in dependent) {
dd[[d]] <- rnorm(nrow(diamonds))
}
library(broom)
res <- (dependent
## set names so .id = argument works downstream
%>% setNames(dependent)
## construct list of formulas
%>% map(reformulate, termlabels = c("carat", "cut", "color", "clarity", "price"))
## fit glmes
%>% map(glm, family = gaussian(link = "identity"), dd,
na.action = na.exclude)
## compute resids (add observation number) and collapse to tibble
%>% map_dfr(~tibble(.obs=seq(nrow(dd)), .resid = residuals(.)), .id = "response")
## widen data → residuals from each response variable as a column
%>% pivot_wider(names_from = "response", values_from = ".resid",
names_prefix ="res_")
%>% select(-.obs)
)
## combine with original data
res2 <- bind_cols(dd, res)
筆記:
glm(., family = gaussian(link = "identity))
,除非它是你用真實數據做的更復雜的事情的占位符。(如果這是你的實際model 然后使用lm()
將更簡單和更快。)na.action = na.exclude
到[g]lm()
調用將在預測、殘差等中包含NA
值,這將幫助您的殘差與原始數據更好地對齊。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.