R：残差建模

Question

I have heard people talk about "modeling on the residuals" when they want to calculate some effect after an a-priori model has been made.我听说人们在制作先验 model后想要计算一些效果时谈论“对残差建模”。 For example, if they know that two variables, var_1 and var_2 are correlated, we first make a model with var_1 and then model the effect of var_2 afterwards.例如，如果他们知道var_1和var_2这两个变量是相关的，我们先用var_1制作一个 model，然后再用var_2制作 var_2 的效果。 My problem is that I've never seen this done in practice.我的问题是我在实践中从未见过这样做。

I'm interested in the following:我对以下内容感兴趣：

If I'm using a glm , how do I account for the link function used?如果我使用glm ，我如何解释使用的link function ？
What distribution do I choose when running a second glm with var_2 as explanatory variable?使用var_2作为解释变量运行第二个glm时，我应该选择什么分布？ I assume this is related to 1.我认为这与1有关。
Is this at all related to using the first models prediction as an offset in the second model?这是否与使用第一个模型预测作为第二个 model 中的偏移量有关？

My attempt :我的尝试：

dt <- data.table(mtcars) # I have a hypothesis that `mpg` is a function of both `cyl` and `wt`
dt[, cyl := as.factor(cyl)]
model <- stats::glm(mpg ~ cyl, family=Gamma(link="log"), data=dt) # I want to model `cyl` first
dt[, pred := stats::predict(model, type="response", newdata=dt)]
dt[, res := mpg - pred]

# will this approach work?
model2_1 <- stats::glm(mpg ~ wt + offset(pred), family=Gamma(link="log"), data=dt)
dt[, pred21 := stats::predict(model2_1, type="response", newdata=dt) ]

# or will this approach work?
model2_2 <- stats::glm(res ~ wt, family=gaussian(), data=dt)
dt[, pred22 := stats::predict(model2_2, type="response", newdata=dt) ]

My first suggested approach has convergence issues, but this is how my silly brain would approach this problem.我的第一个建议方法存在收敛问题，但这是我愚蠢的大脑处理这个问题的方式。 Thanks for any help!谢谢你的帮助！

Answer 1

In a sense, an ANCOVA is 'modeling on the residuals'.从某种意义上说，ANCOVA 是“对残差建模”。 The model for ANCOVA is y_i = grand_mean + treatment_i + b * (covariate - covariate_mean_i) + error for each treatment i . ANCOVA 的 model 是y_i = grand_mean +treatment_i + b * (covariate - covariate_mean_i) + error for each treatment i 。 The term (covariate - covariate_mean_i) can be seen as the residuals of a model with covariate as DV and treatment as IV.术语(covariate - covariate_mean_i)可以看作是 model 的残差，协变量为 DV，治疗为 IV。

The following regression is equivalent to this ANCOVA:以下回归等效于此 ANCOVA：

lm(y ~ treatment * scale(covariate, scale = FALSE))

Which applied to the data would look like this:应用于数据的内容如下所示：

lm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), data = mtcars)

And can be turned into a glm similar to the one you use in your example:并且可以变成类似于您在示例中使用的glm ：

glm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), 
    family=Gamma(link="log"), 
    data = mtcars)

R：残差建模

问题描述

1 个解决方案

解决方案1
0 2021-06-03 13:19:17

R：残差建模

问题描述

1 个解决方案

解决方案1 0 2021-06-03 13:19:17

解决方案1
0 2021-06-03 13:19:17