简体   繁体   English

R:残差建模

[英]R: modeling on residuals

I have heard people talk about "modeling on the residuals" when they want to calculate some effect after an a-priori model has been made.我听说人们在制作先验 model想要计算一些效果时谈论“对残差建模”。 For example, if they know that two variables, var_1 and var_2 are correlated, we first make a model with var_1 and then model the effect of var_2 afterwards.例如,如果他们知道var_1var_2这两个变量是相关的,我们先用var_1制作一个 model,然后再用var_2制作 var_2 的效果。 My problem is that I've never seen this done in practice.我的问题是我在实践中从未见过这样做。

I'm interested in the following:我对以下内容感兴趣:

  1. If I'm using a glm , how do I account for the link function used?如果我使用glm ,我如何解释使用的link function
  2. What distribution do I choose when running a second glm with var_2 as explanatory variable?使用var_2作为解释变量运行第二个glm时,我应该选择什么分布? I assume this is related to 1.我认为这与1有关。
  3. Is this at all related to using the first models prediction as an offset in the second model?这是否与使用第一个模型预测作为第二个 model 中的偏移量有关?

My attempt :我的尝试

dt <- data.table(mtcars) # I have a hypothesis that `mpg` is a function of both `cyl` and `wt`
dt[, cyl := as.factor(cyl)]
model <- stats::glm(mpg ~ cyl, family=Gamma(link="log"), data=dt) # I want to model `cyl` first
dt[, pred := stats::predict(model, type="response", newdata=dt)]
dt[, res := mpg - pred]

# will this approach work?
model2_1 <- stats::glm(mpg ~ wt + offset(pred), family=Gamma(link="log"), data=dt)
dt[, pred21 := stats::predict(model2_1, type="response", newdata=dt) ]

# or will this approach work?
model2_2 <- stats::glm(res ~ wt, family=gaussian(), data=dt)
dt[, pred22 := stats::predict(model2_2, type="response", newdata=dt) ]

My first suggested approach has convergence issues, but this is how my silly brain would approach this problem.我的第一个建议方法存在收敛问题,但这是我愚蠢的大脑处理这个问题的方式。 Thanks for any help!谢谢你的帮助!

In a sense, an ANCOVA is 'modeling on the residuals'.从某种意义上说,ANCOVA 是“对残差建模”。 The model for ANCOVA is y_i = grand_mean + treatment_i + b * (covariate - covariate_mean_i) + error for each treatment i . ANCOVA 的 model 是y_i = grand_mean +treatment_i + b * (covariate - covariate_mean_i) + error for each treatment i The term (covariate - covariate_mean_i) can be seen as the residuals of a model with covariate as DV and treatment as IV.术语(covariate - covariate_mean_i)可以看作是 model 的残差,协变量为 DV,治疗为 IV。

The following regression is equivalent to this ANCOVA:以下回归等效于此 ANCOVA:

lm(y ~ treatment * scale(covariate, scale = FALSE))

Which applied to the data would look like this:应用于数据的内容如下所示:

lm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), data = mtcars)

And can be turned into a glm similar to the one you use in your example:并且可以变成类似于您在示例中使用的glm

glm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), 
    family=Gamma(link="log"), 
    data = mtcars)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM