简体   繁体   English

如何更改零膨胀回归模型的反应?

[英]How to change na.action for zero-inflated regression model?

I am running a zero-inflated negative binomial regression model using the function zeroinfl from the pscl package. 我正在使用pscl包中的功能zeroinfl运行零膨胀负二项式回归模型。

I need to exclude NA's from the model in order to be able to plot the residuals against the dependent variable later in the analysis. 我需要从模型中排除NA以便能够在稍后的分析中针对因变量绘制残差。

Therefore, I want to set na.action="na.exclude" . 因此,我想设置na.action="na.exclude" I can do this without any problem for a non-zero-inflated negative binomial regression model (using glm.nb from the glm package), eg. 对于非零膨胀的负二项式回归模型(使用glm包中的glm.nb ),我可以做到这一点。

fm_nbin <- glm.nb(DV ~ factor(IDV) + contr1
               +contr2 + contr3, data=df, 
               subset=(df$var<500), na.action="na.exclude")
fm_nbin.res = resid(fm_nbin) 
plot(fm_nbin.res~df$var)  

works fine. 工作正常。 However, when I do the same for a zero-inflated model, it does not work: 但是,当我对零膨胀模型执行相同操作时,它将不起作用:

zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
               +contr2 + contr3 | factor(IDV) + contr1
               +contr2 + contr3, data=df, 
               subset=(df$var<500), na.action="na.exclude")
zinfl.res = resid(zinfl) 
plot(zinfl.res~df$var)

gives the error 给出错误

Error in function (formula, data = NULL, subset = NULL, na.action = na.fail,  : 
  variable lengths differ (found for 'df$var')

Is there any other command I should use to exclude NA's from my regression? 我还应该使用其他任何命令从回归中排除NA吗?

Edit: This is the nearest of an answer I could find. 编辑: 是我能找到的最接近的答案。 Can it in some way be applied to my problem? 可以某种方式应用于我的问题吗? Also, can naresid in some way be applied? 另外,可以naresid某种方式使用naresid吗?

As one finds by following the trail of documentation from zeroinfl to glm.fit : "The 'factory-fresh' default is na.omit ." 正如人们从zeroinflglm.fit的文档说明所发现的:“'factory-fresh'默认值为na.omit 。” Notice that I have not put quotes around it since it is supposed to be a function rather but the function will accept it as a name so it doesn't matter if it is quoted. 请注意,由于它应该是一个函数,所以我没有在其两边加上引号,但是该函数会将其接受为名称,因此,是否将其引起引用并不重要。 I will admit that I don't really know how na.omit and na.exclude really differ (something to do with residuals I read), but would definitely go with the default setting first, since it generally delivers what I want from regression functions. 我将承认我并不真正知道na.omitna.exclude真正区别(与我阅读的残差有关),但是肯定会首先使用默认设置,因为它通常可以提供我想要的回归函数。 So try just leaving it out: 因此,尝试将其省略:

zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
           +contr2 + contr3 | factor(IDV) + contr1
           +contr2 + contr3, data=df, 
           subset=(df$var<500) )

Since both the option of using na.omit(df) or na.action="na.exclude" don't seem to work in a zeroinfl regression model, I found another (indirect) way of achieving that NA 's are excluded in the regression. 由于使用na.omit(df)na.action="na.exclude"的选项似乎都无法在zeroinfl回归模型中使用,因此我发现了另一种(间接)方式来实现NA排除在回归。

First, since my original dataset contains far more variables than only the regressors and outcome variable, I created a new dataset including only the variables I use in the regression model; 首先,由于我的原始数据集包含的变量远远超过回归变量和结果变量,因此我创建了一个新的数据集,其中仅包含我在回归模型中使用的变量; and also set a condition on the value of var to include observations in the regression: 并在var的值上设置一个条件,以将观察值包括在回归中:

df1 <- subset(df, var<500, select=c("DV", "IDV", "contr1", "contr2", "contr3"))
df1 <- na.omit(df1)

I then run the same code as above using the new dataset df1 , which works perfectly: 然后,我使用新的数据集df1运行与上述相同的代码,该代码可以完美运行:

zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
           +contr2 + contr3 | factor(IDV) + contr1
           +contr2 + contr3, data=df1)
zinfl.res = resid(zinfl) 
plot(zinfl.res~df1$DV)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM